Meta Llama 3.1 405B Instruct

Overview

Attribute	Value
Model Name	Meta: Llama 3.1 405B Instruct
Model ID	`meta-llama/llama-3.1-405b-instruct`
Creator	Meta (meta-llama)
Release Date	July 23, 2024
Parameters	405 billion
Architecture	Auto-regressive transformer with Grouped-Query Attention (GQA)
Context Length	130,815 tokens (~128K)
Knowledge Cutoff	December 2023

Description

Llama 3.1 405B Instruct is Meta's largest and most capable open-source language model, representing their flagship offering in the Llama 3.1 series. This 405-billion parameter model features a 128K token context window and demonstrates competitive performance against leading closed-source models including GPT-4o and Claude 3.5 Sonnet in benchmark evaluations.

The model was fine-tuned using:

Supervised Fine-Tuning (SFT)
Reinforcement Learning with Human Feedback (RLHF)
Over 25 million synthetically generated examples plus human-generated data

Supported Languages

English
German
French
Italian
Portuguese
Hindi
Spanish
Thai

Technical Specifications

Model Architecture

Type: Auto-regressive transformer
Attention: Grouped-Query Attention (GQA) for improved inference scalability
Input/Output: Text only
Instruction Type: Llama3
Vocabulary Size: 128,256 tokens

Training Data

Attribute	Value
Context Window	128,000 tokens
Training Data Size	~15 trillion tokens from publicly available sources
Fine-tuning Data	>25M synthetically generated examples + human-generated data
Training Infrastructure	Custom Meta GPU cluster

Training Compute

Metric	Value
GPU Hours	~30.8M GPU hours
Hardware	H100-80GB GPUs (700W TDP)
Total Compute	Approximately 3.8 x 10^25 FLOPs
Location-Based Emissions	~8,930 tons CO2eq
Market-Based Emissions	0 tons CO2eq (100% renewable energy)

Supported Parameters

Parameter	Supported
`max_tokens`	Yes
`temperature`	Yes
`top_p`	Yes
`top_k`	Yes
`stop`	Yes
`frequency_penalty`	Yes
`presence_penalty`	Yes
`repetition_penalty`	Yes
`logit_bias`	Yes
`min_p`	Yes
`tools`	Yes
`tool_choice`	Yes

Default Stop Tokens

<|eot_id|>
<|end_of_text|>

Features

Feature	Status
Tool Calling	Supported
Multipart Requests	Supported
Abortable Requests	Supported
Reasoning Capabilities	Not Supported
JSON Mode	Supported

Model	Parameters	Context	Use Case
Llama 3.1 8B Instruct	8B	128K	Lightweight deployment
Llama 3.1 70B Instruct	70B	128K	Balanced performance/cost
Llama 3.1 405B Instruct	405B	128K	Maximum capability
Llama 3.2 Vision	Various	Various	Multimodal (image + text)
Llama 3.3 70B Instruct	70B	128K	Next-gen optimized 70B

Providers

Together AI - Primary Provider

Attribute	Value
Provider Slug	`together`
Model ID at Provider	`meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo`
Base URL	`https://api.langmart.ai/v1`
Region	US
Data Training	No
Prompt Retention	No
Terms	https://www.together.ai/terms-of-service

Additional Providers

The Llama 3.1 405B model is also available through:

Fireworks AI - High-performance inference
Lepton AI - Alternative hosting
Novita AI - Cost-effective option

Pricing (via LangMart)

Standard Tier (Together AI)

Type	Price per Million Tokens
Input	$3.50
Output	$3.50

Quantization: FP8
Provider Model ID: meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo

Free Tier

Type	Price per Million Tokens
Input	$0.00
Output	$0.00

Model ID: meta-llama/llama-3.1-405b-instruct:free
Limited rate/quota applies

Performance Benchmarks

Comparison with Other Leading Models

Benchmark	Category	Llama 3.1 70B	Llama 3.1 405B	GPT-4o	Claude 3.5 Sonnet
MMLU (CoT)	Knowledge	86.0%	88.6%	88.7%	88.7%
MMLU Pro (CoT)	Knowledge	66.4%	73.3%	-	-
IFEval	Steerability	87.5%	88.6%	-	-
GPQA Diamond	Reasoning	48.0%	49.0%	-	-
HumanEval	Code	80.5%	89.0%	90.2%	92.0%
MBPP EvalPlus	Code	86.0%	88.6%	-	-
MATH (CoT)	Math	68.0%	73.8%	76.6%	78.3%
BFCL v2	Tool Use	77.5%	81.1%	-	-
MGSM	Multilingual	86.9%	91.6%	-	-

Performance Highlights

Flagship open-source model with 405B parameters
Competitive with GPT-4o and Claude 3.5 Sonnet across major benchmarks
91.6% on MGSM (multilingual reasoning)
89.0% on HumanEval (code generation)
81.1% on BFCL v2 (tool use / function calling)

Hardware Requirements

Inference

The 405B model requires significant GPU resources for inference:

Configuration	VRAM Required
FP16 (full precision)	~810 GB
FP8 (quantized)	~405 GB
INT4 (quantized)	~203 GB

Recommended Setup

For FP8 inference (used by most providers):

8x A100 80GB GPUs, or
8x H100 80GB GPUs, or
Multi-node setup with smaller GPUs

Using vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="meta-llama/Meta-Llama-3.1-405B-Instruct",
    tensor_parallel_size=8,  # 8 GPUs
    dtype="bfloat16"
)

sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=2048
)

outputs = llm.generate(["Hello, how are you?"], sampling_params)

Tool Use

The model supports function calling with the following pattern:

def get_current_weather(location: str, unit: str = "celsius") -> dict:
    """Get the current weather in a given location.

    Args:
        location: The city and country (e.g., "Paris, France")
        unit: Temperature unit ("celsius" or "fahrenheit")
    Returns:
        Weather information as a dictionary
    """
    return {"temperature": 22, "condition": "sunny"}

messages = [
    {"role": "system", "content": "You are a helpful weather assistant."},
    {"role": "user", "content": "What's the weather in Paris?"}
]

# Using transformers
inputs = tokenizer.apply_chat_template(
    messages,
    tools=[get_current_weather],
    add_generation_prompt=True
)

Tool Response Handling

# After model generates tool call
tool_call = {
    "name": "get_current_weather",
    "arguments": {"location": "Paris, France", "unit": "celsius"}
}
messages.append({
    "role": "assistant",
    "tool_calls": [{"type": "function", "function": tool_call}]
})

# Append tool result
messages.append({
    "role": "tool",
    "name": "get_current_weather",
    "content": '{"temperature": 22, "condition": "sunny"}'
})

License

License: Llama 3.1 Community License Agreement

Key Terms

Non-exclusive, worldwide, non-transferable, royalty-free limited license
Use, reproduce, distribute, copy, create derivative works
Modify the Llama Materials

Commercial Requirements

If monthly active users exceed 700M, you must request a license from Meta
Must include "Built with Llama" on related websites, user interfaces, and documentation
Must include "Llama" at the beginning of any AI model name built with Llama 3.1

Attribution Required

Llama 3.1 is licensed under the Llama 3.1 Community License,
Copyright Meta Platforms, Inc. All Rights Reserved.

Prohibited Uses

Violence, terrorism, and illegal activities
Child exploitation and abuse material
Human trafficking and sexual violence
Harassment and bullying
Discrimination in employment, credit, housing
Unauthorized professional practice (legal, medical, financial)
Malware and malicious code creation
Fraud, disinformation, and defamation
Impersonation and misrepresentation
Violations of ITAR, biological/chemical weapons regulations

Safety Considerations

Critical Risk Mitigation Areas

CBRNE Materials - Uplift testing to assess proliferation risks
Child Safety - Expert red teaming across supported languages
Cyber Attack Enablement - Hacking task capability evaluation

Recommended Safeguards

Tool	Purpose
Llama Guard 3	Input/output filtering
Prompt Guard	Prompt injection detection
Code Shield	Code security analysis

Multilinguality Caution

The model supports 7 non-English languages with safety thresholds met. Use in non-supported languages is strongly discouraged without:

Fine-tuning
System controls aligned with use case policies

Data Policy

Policy	Status
Training on User Data	No
Prompt Retention	No
Acceptable Use Policy	https://llama.meta.com/llama3/use-policy/

API Usage Examples

LangMart API

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-405b-instruct",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Free Tier

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-405b-instruct:free",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Together AI Direct

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.langmart.ai/v1",
    api_key="YOUR_LANGMART_API_KEY"
)

response = client.chat.completions.create(
    model="meta-llama/llama-3.1-405b-instruct",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    temperature=0.7,
    max_tokens=1024
)

print(response.choices[0].message.content)

Resources

Official Documentation

Issue Reporting

Issue Type	Contact
Model Issues	https://github.com/meta-llama/llama-models/issues
Risky Content	developers.facebook.com/llama_output_feedback
Security Bugs	facebook.com/whitehat/info
Policy Violations	LlamaUseReport@meta.com

Last updated: December 2024 Source: LangMart, Hugging Face, Meta Model Card

Meta Llama 3.1 405B Instruct

Meta Llama 3.1 405B Instruct

Overview

Description

Supported Languages

Technical Specifications

Model Architecture

Training Data

Training Compute

Supported Parameters

Default Stop Tokens

Features

Related Models

Providers

Together AI - Primary Provider

Additional Providers

Pricing (via LangMart)

Standard Tier (Together AI)

Free Tier

Performance Benchmarks

Comparison with Other Leading Models

Performance Highlights

Hardware Requirements

Inference

Recommended Setup

Using vLLM

Tool Use

Tool Response Handling

License

Key Terms

Commercial Requirements

Attribution Required

Prohibited Uses

Safety Considerations

Critical Risk Mitigation Areas

Recommended Safeguards

Multilinguality Caution

Data Policy

API Usage Examples

LangMart API

Free Tier

Together AI Direct

Python (OpenAI SDK)

Resources

Official Documentation

Issue Reporting