M

Meta Llama 3.1 8B Instruct

Meta
Vision
128K
Context
$0.0200
Input /1M
$0.0300
Output /1M
N/A
Max Output

Meta Llama 3.1 8B Instruct

Overview

Attribute Value
Model Name Meta: Llama 3.1 8B Instruct
Model ID meta-llama/llama-3.1-8b-instruct
Creator Meta (meta-llama)
Release Date July 23, 2024
Parameters 8 billion
Architecture Auto-regressive transformer with Grouped-Query Attention (GQA)
Context Length 131,072 tokens (128K)
Knowledge Cutoff December 2023

Description

Llama 3.1 8B Instruct is part of Meta's latest class of language models, offering a balance between efficiency and capability. This 8-billion parameter instruction-tuned variant emphasizes speed and efficiency while delivering strong performance comparable to leading closed-source models in human evaluations.

The model was fine-tuned using:

  • Supervised Fine-Tuning (SFT)
  • Reinforcement Learning with Human Feedback (RLHF)
  • Over 25 million synthetically generated examples plus human-generated data

Supported Languages

  • English
  • German
  • French
  • Italian
  • Portuguese
  • Hindi
  • Spanish
  • Thai

Technical Specifications

Model Architecture

  • Type: Auto-regressive transformer
  • Attention: Grouped-Query Attention (GQA) for improved inference scalability
  • Input/Output: Multilingual text in / text and code out
  • Instruction Type: Llama3

Training Data

Attribute Value
Context Window 128,000 tokens
Training Data Size ~15 trillion tokens from publicly available sources
Fine-tuning Data >25M synthetically generated examples + human-generated data
Data Source New mix of publicly available online data
Training Infrastructure Custom Meta GPU cluster

Training Compute

Metric Value
GPU Hours 1.46M GPU hours
Hardware H100-80GB GPUs (700W TDP)
Location-Based Emissions ~420 tons CO2eq

Supported Parameters

Parameter Supported
max_tokens Yes
temperature Yes
top_p Yes
top_k Yes
stop Yes
frequency_penalty Yes
presence_penalty Yes
repetition_penalty Yes
seed Yes
min_p Yes
response_format Yes
tools Yes
tool_choice Yes

Default Stop Tokens

  • <|eot_id|>
  • <|end_of_text|>

Features

Feature Status
Tool Calling Supported
Multipart Requests Supported
Abortable Requests Supported
Reasoning Capabilities Not Supported

Model Parameters Context Use Case
Llama 3.1 70B Instruct 70B 128K Higher capability
Llama 3.1 405B Instruct 405B 128K Maximum capability
Llama 3.2 1B Instruct 1B 128K Edge deployment
Llama 3.2 3B Instruct 3B 128K Mobile/edge
Llama 3.3 70B Instruct 70B 128K Latest 70B model
Llama 3.2 Vision Various Various Multimodal (image + text)

Providers

DeepInfra (Turbo) - Primary Provider

Attribute Value
Provider Slug deepinfra/turbo
Model ID at Provider meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
Base URL https://api.langmart.ai/v1/openai
Region US
Data Training No
Prompt Retention No
Terms https://deepinfra.com/terms
Privacy https://deepinfra.com/privacy

Pricing (via LangMart)

Standard Tier (DeepInfra Turbo)

Type Price per Million Tokens
Input $0.02
Output $0.03
  • Quantization: FP8
  • Max Completion Tokens: 16,384

Free Tier

Type Price per Million Tokens
Input $0.00
Output $0.00
  • Model ID: meta-llama/llama-3.1-8b-instruct:free
  • Limited rate/quota applies

Performance Benchmarks

Instruction-Tuned Model Performance

Category Benchmark Metric Llama 3.1 8B
General MMLU (5-shot) macro_avg/acc 69.4%
MMLU (CoT) macro_avg/acc 73.0%
IFEval accuracy 80.4%
Reasoning ARC-C accuracy 83.4%
GPQA exact match 30.4%
GPQA Diamond exact match 31.8%
Code HumanEval pass@1 72.6%
MBPP++ pass@1 72.8%
Math GSM-8K (CoT) exact match 84.5%
MATH (CoT) final_em 51.9%
Tool Use API-Bank accuracy 82.6%
BFCL accuracy 76.1%
BFCL v2 accuracy 65.4%
Multilingual MGSM (CoT) exact match 68.9%

Comparison with Larger Models

Benchmark Llama 3.1 8B Llama 3.1 70B Llama 3.1 405B
MMLU (CoT) 73.0% 86.0% 88.6%
IFEval 80.4% 87.5% 88.6%
HumanEval 72.6% 80.5% 89.0%
MATH (CoT) 51.9% 68.0% 73.8%
MGSM 68.9% 86.9% 91.6%

Hardware Requirements

Inference

Use device_map="auto" for automatic device placement:

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

Quantization Options

8-bit Quantization:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config
)

4-bit Quantization:

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

Tool Use

The model supports function calling with the following pattern:

def get_current_temperature(location: str) -> float:
    """Get the current temperature at a location.

    Args:
        location: The location (format: "City, Country")
    Returns:
        Temperature as a float
    """
    return 22.0

messages = [
    {"role": "system", "content": "You are a weather bot."},
    {"role": "user", "content": "What's the temperature in Paris?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tools=[get_current_temperature],
    add_generation_prompt=True
)

Tool Response Handling

# After model generates tool call
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

# Append tool result
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})

License

License: Llama 3.1 Community License Agreement

Key Terms

  • Non-exclusive, worldwide, non-transferable, royalty-free limited license
  • Use, reproduce, distribute, copy, create derivative works
  • Modify the Llama Materials

Commercial Requirements

  • If monthly active users exceed 700M, you must request a license from Meta
  • Must include "Built with Llama" on related websites, user interfaces, and documentation
  • Must include "Llama" at the beginning of any AI model name built with Llama 3.1

Attribution Required

Llama 3.1 is licensed under the Llama 3.1 Community License,
Copyright Meta Platforms, Inc. All Rights Reserved.

Prohibited Uses

  • Violence, terrorism, and illegal activities
  • Child exploitation and abuse material
  • Human trafficking and sexual violence
  • Harassment and bullying
  • Discrimination in employment, credit, housing
  • Unauthorized professional practice (legal, medical, financial)
  • Malware and malicious code creation
  • Fraud, disinformation, and defamation
  • Impersonation and misrepresentation
  • Violations of ITAR, biological/chemical weapons regulations

Safety Considerations

Critical Risk Mitigation Areas

  1. CBRNE Materials - Uplift testing to assess proliferation risks
  2. Child Safety - Expert red teaming across supported languages
  3. Cyber Attack Enablement - Hacking task capability evaluation
Tool Purpose
Llama Guard 3 Input/output filtering
Prompt Guard Prompt injection detection
Code Shield Code security analysis

Multilinguality Caution

The model supports 8 languages with safety thresholds met. Use in non-supported languages is strongly discouraged without:

  • Fine-tuning
  • System controls aligned with use case policies

API Usage Examples

LangMart API

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Free Tier

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct:free",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

DeepInfra Direct

curl https://api.langmart.ai/v1/openai/chat/completions \
  -H "Authorization: Bearer $DEEPINFRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Python with Transformers

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Resources

Official Documentation

Issue Reporting

Issue Type Contact
Model Issues https://github.com/meta-llama/llama-models/issues
Risky Content developers.facebook.com/llama_output_feedback
Security Bugs facebook.com/whitehat/info
Policy Violations LlamaUseReport@meta.com


Last updated: December 23, 2024 Source: LangMart, Hugging Face, Meta Model Card Verified: Data confirmed accurate via LangMart API scrape