Overview
| Attribute |
Value |
| Model Name |
Meta: Llama 3.3 70B Instruct |
| Model ID |
meta-llama/llama-3.3-70b-instruct |
| Creator |
Meta (meta-llama) |
| Release Date |
December 6, 2024 |
| Parameters |
70 billion |
| Architecture |
Auto-regressive transformer with Grouped-Query Attention (GQA) |
| Context Length |
131,072 tokens (128K) |
| Knowledge Cutoff |
December 2023 |
Description
Llama 3.3 70B Instruct is a pretrained and instruction-tuned generative model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks.
The model was fine-tuned using:
- Supervised Fine-Tuning (SFT)
- Reinforcement Learning with Human Feedback (RLHF)
- Over 25 million synthetically generated examples plus human-generated data
Supported Languages
- English
- German
- French
- Italian
- Portuguese
- Hindi
- Spanish
- Thai
Technical Specifications
Model Architecture
- Type: Auto-regressive transformer
- Attention: Grouped-Query Attention (GQA) for improved inference scalability
- Input/Output: Text only
- Instruction Type: Llama3
Training Data
| Attribute |
Value |
| Context Window |
128,000 tokens |
| Training Data Size |
~15 trillion tokens from publicly available sources |
| Fine-tuning Data |
>25M synthetically generated examples + human-generated data |
| Training Infrastructure |
Custom Meta GPU cluster |
Training Compute
| Metric |
Value |
| GPU Hours |
7.0M GPU hours |
| Hardware |
H100-80GB GPUs (700W TDP) |
| Total Compute |
39.3M cumulative GPU hours |
| Location-Based Emissions |
11,390 tons CO2eq |
| Market-Based Emissions |
0 tons CO2eq (100% renewable energy) |
Supported Parameters
| Parameter |
Supported |
max_tokens |
Yes |
temperature |
Yes |
top_p |
Yes |
top_k |
Yes |
stop |
Yes |
frequency_penalty |
Yes |
presence_penalty |
Yes |
repetition_penalty |
Yes |
seed |
Yes |
min_p |
Yes |
response_format |
Yes |
tools |
Yes |
tool_choice |
Yes |
Default Stop Tokens
<|eot_id|>
<|end_of_text|>
Features
| Feature |
Status |
| Tool Calling |
Supported |
| Multipart Requests |
Supported |
| Abortable Requests |
Supported |
| Reasoning Capabilities |
Not Supported |
| Model |
Parameters |
Context |
Use Case |
| Llama 3.1 8B Instruct |
8B |
128K |
Lightweight deployment |
| Llama 3.1 70B Instruct |
70B |
128K |
Previous generation |
| Llama 3.1 405B Instruct |
405B |
128K |
Maximum capability |
| Llama 3.2 Vision |
Various |
Various |
Multimodal (image + text) |
Providers
DeepInfra (Turbo) - Primary Provider
ModelRun - Free Tier Provider
| Attribute |
Value |
| Provider |
ModelRun |
| Region |
US |
| Data Training |
No |
| Prompt Retention |
No |
Pricing (via LangMart)
Standard Tier (DeepInfra Turbo)
| Type |
Price per Million Tokens |
| Input |
$0.10 |
| Output |
$0.32 |
- Quantization: FP8
- Max Completion Tokens: 16,384
Free Tier (ModelRun)
| Type |
Price per Million Tokens |
| Input |
$0.00 |
| Output |
$0.00 |
- Model ID:
meta-llama/llama-3.3-70b-instruct:free
- Limited rate/quota applies
English Text Instruction-Tuned Models Comparison
| Benchmark |
Category |
Llama 3.1 8B |
Llama 3.1 70B |
Llama 3.3 70B |
Llama 3.1 405B |
| MMLU (CoT) |
Knowledge |
73.0% |
86.0% |
86.0% |
88.6% |
| MMLU Pro (CoT) |
Knowledge |
48.3% |
66.4% |
68.9% |
73.3% |
| IFEval |
Steerability |
80.4% |
87.5% |
92.1% |
88.6% |
| GPQA Diamond |
Reasoning |
31.8% |
48.0% |
50.5% |
49.0% |
| HumanEval |
Code |
72.6% |
80.5% |
88.4% |
89.0% |
| MBPP EvalPlus |
Code |
72.8% |
86.0% |
87.6% |
88.6% |
| MATH (CoT) |
Math |
51.9% |
68.0% |
77.0% |
73.8% |
| BFCL v2 |
Tool Use |
65.4% |
77.5% |
77.3% |
81.1% |
| MGSM |
Multilingual |
68.9% |
86.9% |
91.1% |
91.6% |
- 92.1% on IFEval (instruction following) - exceeds 405B model
- 88.4% on HumanEval (code generation) - near 405B performance
- 77.0% on MATH reasoning - exceeds 405B model
- 91.1% on MGSM (multilingual) - matches 405B model
Hardware Requirements
Inference
Use device_map="auto" for automatic device placement:
import transformers
import torch
model_id = "meta-llama/Llama-3.3-70B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
Quantization Options
8-bit Quantization:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.3-70B-Instruct",
device_map="auto",
torch_dtype=torch.bfloat16,
quantization_config=quantization_config
)
4-bit Quantization:
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
The model supports function calling with the following pattern:
def get_current_temperature(location: str) -> float:
"""Get the current temperature at a location.
Args:
location: The location (format: "City, Country")
Returns:
Temperature as a float
"""
return 22.0
messages = [
{"role": "system", "content": "You are a weather bot."},
{"role": "user", "content": "What's the temperature in Paris?"}
]
inputs = tokenizer.apply_chat_template(
messages,
tools=[get_current_temperature],
add_generation_prompt=True
)
# After model generates tool call
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
# Append tool result
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})
License
License: Llama 3.3 Community License Agreement
Key Terms
- Non-exclusive, worldwide, non-transferable, royalty-free limited license
- Use, reproduce, distribute, copy, create derivative works
- Modify the Llama Materials
Commercial Requirements
- If monthly active users exceed 700M, you must request a license from Meta
- Must include "Built with Llama" on related websites, user interfaces, and documentation
- Must include "Llama" at the beginning of any AI model name built with Llama 3.3
Attribution Required
Llama 3.3 is licensed under the Llama 3.3 Community License,
Copyright Meta Platforms, Inc. All Rights Reserved.
Prohibited Uses
- Violence, terrorism, and illegal activities
- Child exploitation and abuse material
- Human trafficking and sexual violence
- Harassment and bullying
- Discrimination in employment, credit, housing
- Unauthorized professional practice (legal, medical, financial)
- Malware and malicious code creation
- Fraud, disinformation, and defamation
- Impersonation and misrepresentation
- Violations of ITAR, biological/chemical weapons regulations
Safety Considerations
Critical Risk Mitigation Areas
- CBRNE Materials - Uplift testing to assess proliferation risks
- Child Safety - Expert red teaming across supported languages
- Cyber Attack Enablement - Hacking task capability evaluation
Recommended Safeguards
| Tool |
Purpose |
| Llama Guard 3 |
Input/output filtering |
| Prompt Guard |
Prompt injection detection |
| Code Shield |
Code security analysis |
Multilinguality Caution
The model supports 7 non-English languages with safety thresholds met. Use in non-supported languages is strongly discouraged without:
- Fine-tuning
- System controls aligned with use case policies
API Usage Examples
LangMart API
curl https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/llama-3.3-70b-instruct",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
Free Tier
curl https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/llama-3.3-70b-instruct:free",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
DeepInfra Direct
curl https://api.langmart.ai/v1/openai/chat/completions \
-H "Authorization: Bearer $DEEPINFRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'
Resources
Official Documentation
Issue Reporting
Last updated: December 2024
Source: LangMart, Hugging Face, Meta Model Card