Meta Llama 3.3 70B Instruct

Overview

Attribute	Value
Model Name	Meta: Llama 3.3 70B Instruct
Model ID	`meta-llama/llama-3.3-70b-instruct`
Creator	Meta (meta-llama)
Release Date	December 6, 2024
Parameters	70 billion
Architecture	Auto-regressive transformer with Grouped-Query Attention (GQA)
Context Length	131,072 tokens (128K)
Knowledge Cutoff	December 2023

Description

Llama 3.3 70B Instruct is a pretrained and instruction-tuned generative model optimized for multilingual dialogue use cases. It outperforms many available open source and closed chat models on common industry benchmarks.

The model was fine-tuned using:

Supervised Fine-Tuning (SFT)
Reinforcement Learning with Human Feedback (RLHF)
Over 25 million synthetically generated examples plus human-generated data

Supported Languages

English
German
French
Italian
Portuguese
Hindi
Spanish
Thai

Technical Specifications

Model Architecture

Type: Auto-regressive transformer
Attention: Grouped-Query Attention (GQA) for improved inference scalability
Input/Output: Text only
Instruction Type: Llama3

Training Data

Attribute	Value
Context Window	128,000 tokens
Training Data Size	~15 trillion tokens from publicly available sources
Fine-tuning Data	>25M synthetically generated examples + human-generated data
Training Infrastructure	Custom Meta GPU cluster

Training Compute

Metric	Value
GPU Hours	7.0M GPU hours
Hardware	H100-80GB GPUs (700W TDP)
Total Compute	39.3M cumulative GPU hours
Location-Based Emissions	11,390 tons CO2eq
Market-Based Emissions	0 tons CO2eq (100% renewable energy)

Supported Parameters

Parameter	Supported
`max_tokens`	Yes
`temperature`	Yes
`top_p`	Yes
`top_k`	Yes
`stop`	Yes
`frequency_penalty`	Yes
`presence_penalty`	Yes
`repetition_penalty`	Yes
`seed`	Yes
`min_p`	Yes
`response_format`	Yes
`tools`	Yes
`tool_choice`	Yes

Default Stop Tokens

<|eot_id|>
<|end_of_text|>

Features

Feature	Status
Tool Calling	Supported
Multipart Requests	Supported
Abortable Requests	Supported
Reasoning Capabilities	Not Supported

Model	Parameters	Context	Use Case
Llama 3.1 8B Instruct	8B	128K	Lightweight deployment
Llama 3.1 70B Instruct	70B	128K	Previous generation
Llama 3.1 405B Instruct	405B	128K	Maximum capability
Llama 3.2 Vision	Various	Various	Multimodal (image + text)

Providers

DeepInfra (Turbo) - Primary Provider

Attribute	Value
Provider Slug	`deepinfra/turbo`
Model ID at Provider	`meta-llama/Llama-3.3-70B-Instruct-Turbo`
Base URL	`https://api.langmart.ai/v1/openai`
Region	US
Data Training	No
Prompt Retention	No
Terms	https://deepinfra.com/terms
Privacy	https://deepinfra.com/privacy

ModelRun - Free Tier Provider

Attribute	Value
Provider	ModelRun
Region	US
Data Training	No
Prompt Retention	No

Pricing (via LangMart)

Standard Tier (DeepInfra Turbo)

Type	Price per Million Tokens
Input	$0.10
Output	$0.32

Quantization: FP8
Max Completion Tokens: 16,384

Free Tier (ModelRun)

Type	Price per Million Tokens
Input	$0.00
Output	$0.00

Model ID: meta-llama/llama-3.3-70b-instruct:free
Limited rate/quota applies

Performance Benchmarks

English Text Instruction-Tuned Models Comparison

Benchmark	Category	Llama 3.1 8B	Llama 3.1 70B	Llama 3.3 70B	Llama 3.1 405B
MMLU (CoT)	Knowledge	73.0%	86.0%	86.0%	88.6%
MMLU Pro (CoT)	Knowledge	48.3%	66.4%	68.9%	73.3%
IFEval	Steerability	80.4%	87.5%	92.1%	88.6%
GPQA Diamond	Reasoning	31.8%	48.0%	50.5%	49.0%
HumanEval	Code	72.6%	80.5%	88.4%	89.0%
MBPP EvalPlus	Code	72.8%	86.0%	87.6%	88.6%
MATH (CoT)	Math	51.9%	68.0%	77.0%	73.8%
BFCL v2	Tool Use	65.4%	77.5%	77.3%	81.1%
MGSM	Multilingual	68.9%	86.9%	91.1%	91.6%

Performance Highlights

92.1% on IFEval (instruction following) - exceeds 405B model
88.4% on HumanEval (code generation) - near 405B performance
77.0% on MATH reasoning - exceeds 405B model
91.1% on MGSM (multilingual) - matches 405B model

Hardware Requirements

Inference

Use device_map="auto" for automatic device placement:

import transformers
import torch

model_id = "meta-llama/Llama-3.3-70B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

Quantization Options

8-bit Quantization:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.3-70B-Instruct",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config
)

4-bit Quantization:

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

Tool Use

The model supports function calling with the following pattern:

def get_current_temperature(location: str) -> float:
    """Get the current temperature at a location.

    Args:
        location: The location (format: "City, Country")
    Returns:
        Temperature as a float
    """
    return 22.0

messages = [
    {"role": "system", "content": "You are a weather bot."},
    {"role": "user", "content": "What's the temperature in Paris?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tools=[get_current_temperature],
    add_generation_prompt=True
)

Tool Response Handling

# After model generates tool call
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

# Append tool result
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})

License

License: Llama 3.3 Community License Agreement

Key Terms

Non-exclusive, worldwide, non-transferable, royalty-free limited license
Use, reproduce, distribute, copy, create derivative works
Modify the Llama Materials

Commercial Requirements

If monthly active users exceed 700M, you must request a license from Meta
Must include "Built with Llama" on related websites, user interfaces, and documentation
Must include "Llama" at the beginning of any AI model name built with Llama 3.3

Attribution Required

Llama 3.3 is licensed under the Llama 3.3 Community License,
Copyright Meta Platforms, Inc. All Rights Reserved.

Prohibited Uses

Violence, terrorism, and illegal activities
Child exploitation and abuse material
Human trafficking and sexual violence
Harassment and bullying
Discrimination in employment, credit, housing
Unauthorized professional practice (legal, medical, financial)
Malware and malicious code creation
Fraud, disinformation, and defamation
Impersonation and misrepresentation
Violations of ITAR, biological/chemical weapons regulations

Safety Considerations

Critical Risk Mitigation Areas

CBRNE Materials - Uplift testing to assess proliferation risks
Child Safety - Expert red teaming across supported languages
Cyber Attack Enablement - Hacking task capability evaluation

Recommended Safeguards

Tool	Purpose
Llama Guard 3	Input/output filtering
Prompt Guard	Prompt injection detection
Code Shield	Code security analysis

Multilinguality Caution

The model supports 7 non-English languages with safety thresholds met. Use in non-supported languages is strongly discouraged without:

Fine-tuning
System controls aligned with use case policies

API Usage Examples

LangMart API

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.3-70b-instruct",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Free Tier

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.3-70b-instruct:free",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

DeepInfra Direct

curl https://api.langmart.ai/v1/openai/chat/completions \
  -H "Authorization: Bearer $DEEPINFRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Resources

Official Documentation

Issue Reporting

Issue Type	Contact
Model Issues	https://github.com/meta-llama/llama-models/issues
Risky Content	developers.facebook.com/llama_output_feedback
Security Bugs	facebook.com/whitehat/info
Policy Violations	LlamaUseReport@meta.com

Last updated: December 2024 Source: LangMart, Hugging Face, Meta Model Card

Meta Llama 3.3 70B Instruct

Meta Llama 3.3 70B Instruct

Overview

Description

Supported Languages

Technical Specifications

Model Architecture

Training Data

Training Compute

Supported Parameters

Default Stop Tokens

Features

Related Models

Providers

DeepInfra (Turbo) - Primary Provider

ModelRun - Free Tier Provider

Pricing (via LangMart)

Standard Tier (DeepInfra Turbo)

Free Tier (ModelRun)

Performance Benchmarks

English Text Instruction-Tuned Models Comparison

Performance Highlights

Hardware Requirements

Inference

Quantization Options

Tool Use

Tool Response Handling

License

Key Terms

Commercial Requirements

Attribution Required

Prohibited Uses

Safety Considerations

Critical Risk Mitigation Areas

Recommended Safeguards

Multilinguality Caution

API Usage Examples

LangMart API

Free Tier

DeepInfra Direct

Resources

Official Documentation

Issue Reporting