M

Meta Llama 3.3 70B Instruct

Meta
Vision
128K
Context
$0.1000
Input /1M
$0.3200
Output /1M
N/A
Max Output

Meta Llama 3.3 70B Instruct

Overview

Attribute Value
Model Name Meta: Llama 3.3 70B Instruct
Model ID meta-llama/llama-3.3-70b-instruct
Creator Meta (meta-llama)
Release Date December 6, 2024
Parameters 70 billion
Architecture Auto-regressive transformer with Grouped-Query Attention (GQA)
Context Length 131,072 tokens (128K)
Knowledge Cutoff December 2023

Description

Llama 3.3 70B Instruct is a pretrained and instruction-tuned generative model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks.

The model was fine-tuned using:

  • Supervised Fine-Tuning (SFT)
  • Reinforcement Learning with Human Feedback (RLHF)
  • Over 25 million synthetically generated examples plus human-generated data

Supported Languages

  • English
  • German
  • French
  • Italian
  • Portuguese
  • Hindi
  • Spanish
  • Thai

Technical Specifications

Model Architecture

  • Type: Auto-regressive transformer
  • Attention: Grouped-Query Attention (GQA) for improved inference scalability
  • Input/Output: Text only
  • Instruction Type: Llama3

Training Data

Attribute Value
Context Window 128,000 tokens
Training Data Size ~15 trillion tokens from publicly available sources
Fine-tuning Data >25M synthetically generated examples + human-generated data
Training Infrastructure Custom Meta GPU cluster

Training Compute

Metric Value
GPU Hours 7.0M GPU hours
Hardware H100-80GB GPUs (700W TDP)
Total Compute 39.3M cumulative GPU hours
Location-Based Emissions 11,390 tons CO2eq
Market-Based Emissions 0 tons CO2eq (100% renewable energy)

Supported Parameters

Parameter Supported
max_tokens Yes
temperature Yes
top_p Yes
top_k Yes
stop Yes
frequency_penalty Yes
presence_penalty Yes
repetition_penalty Yes
seed Yes
min_p Yes
response_format Yes
tools Yes
tool_choice Yes

Default Stop Tokens

  • <|eot_id|>
  • <|end_of_text|>

Features

Feature Status
Tool Calling Supported
Multipart Requests Supported
Abortable Requests Supported
Reasoning Capabilities Not Supported

Model Parameters Context Use Case
Llama 3.1 8B Instruct 8B 128K Lightweight deployment
Llama 3.1 70B Instruct 70B 128K Previous generation
Llama 3.1 405B Instruct 405B 128K Maximum capability
Llama 3.2 Vision Various Various Multimodal (image + text)

Providers

DeepInfra (Turbo) - Primary Provider

Attribute Value
Provider Slug deepinfra/turbo
Model ID at Provider meta-llama/Llama-3.3-70B-Instruct-Turbo
Base URL https://api.langmart.ai/v1/openai
Region US
Data Training No
Prompt Retention No
Terms https://deepinfra.com/terms
Privacy https://deepinfra.com/privacy

ModelRun - Free Tier Provider

Attribute Value
Provider ModelRun
Region US
Data Training No
Prompt Retention No

Pricing (via LangMart)

Standard Tier (DeepInfra Turbo)

Type Price per Million Tokens
Input $0.10
Output $0.32
  • Quantization: FP8
  • Max Completion Tokens: 16,384

Free Tier (ModelRun)

Type Price per Million Tokens
Input $0.00
Output $0.00
  • Model ID: meta-llama/llama-3.3-70b-instruct:free
  • Limited rate/quota applies

Performance Benchmarks

English Text Instruction-Tuned Models Comparison

Benchmark Category Llama 3.1 8B Llama 3.1 70B Llama 3.3 70B Llama 3.1 405B
MMLU (CoT) Knowledge 73.0% 86.0% 86.0% 88.6%
MMLU Pro (CoT) Knowledge 48.3% 66.4% 68.9% 73.3%
IFEval Steerability 80.4% 87.5% 92.1% 88.6%
GPQA Diamond Reasoning 31.8% 48.0% 50.5% 49.0%
HumanEval Code 72.6% 80.5% 88.4% 89.0%
MBPP EvalPlus Code 72.8% 86.0% 87.6% 88.6%
MATH (CoT) Math 51.9% 68.0% 77.0% 73.8%
BFCL v2 Tool Use 65.4% 77.5% 77.3% 81.1%
MGSM Multilingual 68.9% 86.9% 91.1% 91.6%

Performance Highlights

  • 92.1% on IFEval (instruction following) - exceeds 405B model
  • 88.4% on HumanEval (code generation) - near 405B performance
  • 77.0% on MATH reasoning - exceeds 405B model
  • 91.1% on MGSM (multilingual) - matches 405B model

Hardware Requirements

Inference

Use device_map="auto" for automatic device placement:

import transformers
import torch

model_id = "meta-llama/Llama-3.3-70B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

Quantization Options

8-bit Quantization:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.3-70B-Instruct",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config
)

4-bit Quantization:

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

Tool Use

The model supports function calling with the following pattern:

def get_current_temperature(location: str) -> float:
    """Get the current temperature at a location.

    Args:
        location: The location (format: "City, Country")
    Returns:
        Temperature as a float
    """
    return 22.0

messages = [
    {"role": "system", "content": "You are a weather bot."},
    {"role": "user", "content": "What's the temperature in Paris?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tools=[get_current_temperature],
    add_generation_prompt=True
)

Tool Response Handling

# After model generates tool call
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

# Append tool result
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})

License

License: Llama 3.3 Community License Agreement

Key Terms

  • Non-exclusive, worldwide, non-transferable, royalty-free limited license
  • Use, reproduce, distribute, copy, create derivative works
  • Modify the Llama Materials

Commercial Requirements

  • If monthly active users exceed 700M, you must request a license from Meta
  • Must include "Built with Llama" on related websites, user interfaces, and documentation
  • Must include "Llama" at the beginning of any AI model name built with Llama 3.3

Attribution Required

Llama 3.3 is licensed under the Llama 3.3 Community License,
Copyright Meta Platforms, Inc. All Rights Reserved.

Prohibited Uses

  • Violence, terrorism, and illegal activities
  • Child exploitation and abuse material
  • Human trafficking and sexual violence
  • Harassment and bullying
  • Discrimination in employment, credit, housing
  • Unauthorized professional practice (legal, medical, financial)
  • Malware and malicious code creation
  • Fraud, disinformation, and defamation
  • Impersonation and misrepresentation
  • Violations of ITAR, biological/chemical weapons regulations

Safety Considerations

Critical Risk Mitigation Areas

  1. CBRNE Materials - Uplift testing to assess proliferation risks
  2. Child Safety - Expert red teaming across supported languages
  3. Cyber Attack Enablement - Hacking task capability evaluation
Tool Purpose
Llama Guard 3 Input/output filtering
Prompt Guard Prompt injection detection
Code Shield Code security analysis

Multilinguality Caution

The model supports 7 non-English languages with safety thresholds met. Use in non-supported languages is strongly discouraged without:

  • Fine-tuning
  • System controls aligned with use case policies

API Usage Examples

LangMart API

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.3-70b-instruct",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Free Tier

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.3-70b-instruct:free",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

DeepInfra Direct

curl https://api.langmart.ai/v1/openai/chat/completions \
  -H "Authorization: Bearer $DEEPINFRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Resources

Official Documentation

Issue Reporting

Issue Type Contact
Model Issues https://github.com/meta-llama/llama-models/issues
Risky Content developers.facebook.com/llama_output_feedback
Security Bugs facebook.com/whitehat/info
Policy Violations LlamaUseReport@meta.com


Last updated: December 2024 Source: LangMart, Hugging Face, Meta Model Card