Meta Llama 3.1 8B Instruct

Overview

Attribute	Value
Model Name	Meta: Llama 3.1 8B Instruct
Model ID	`meta-llama/llama-3.1-8b-instruct`
Creator	Meta (meta-llama)
Release Date	July 23, 2024
Parameters	8 billion
Architecture	Auto-regressive transformer with Grouped-Query Attention (GQA)
Context Length	131,072 tokens (128K)
Knowledge Cutoff	December 2023

Description

Llama 3.1 8B Instruct is part of Meta's latest class of language models, offering a balance between efficiency and capability. This 8-billion parameter instruction-tuned variant emphasizes speed and efficiency while delivering strong performance comparable to leading closed-source models in human evaluations.

The model was fine-tuned using:

Supervised Fine-Tuning (SFT)
Reinforcement Learning with Human Feedback (RLHF)
Over 25 million synthetically generated examples plus human-generated data

Supported Languages

English
German
French
Italian
Portuguese
Hindi
Spanish
Thai

Technical Specifications

Model Architecture

Type: Auto-regressive transformer
Attention: Grouped-Query Attention (GQA) for improved inference scalability
Input/Output: Multilingual text in / text and code out
Instruction Type: Llama3

Training Data

Attribute	Value
Context Window	128,000 tokens
Training Data Size	~15 trillion tokens from publicly available sources
Fine-tuning Data	>25M synthetically generated examples + human-generated data
Data Source	New mix of publicly available online data
Training Infrastructure	Custom Meta GPU cluster

Training Compute

Metric	Value
GPU Hours	1.46M GPU hours
Hardware	H100-80GB GPUs (700W TDP)
Location-Based Emissions	~420 tons CO2eq

Supported Parameters

Parameter	Supported
`max_tokens`	Yes
`temperature`	Yes
`top_p`	Yes
`top_k`	Yes
`stop`	Yes
`frequency_penalty`	Yes
`presence_penalty`	Yes
`repetition_penalty`	Yes
`seed`	Yes
`min_p`	Yes
`response_format`	Yes
`tools`	Yes
`tool_choice`	Yes

Default Stop Tokens

<|eot_id|>
<|end_of_text|>

Features

Feature	Status
Tool Calling	Supported
Multipart Requests	Supported
Abortable Requests	Supported
Reasoning Capabilities	Not Supported

Model	Parameters	Context	Use Case
Llama 3.1 70B Instruct	70B	128K	Higher capability
Llama 3.1 405B Instruct	405B	128K	Maximum capability
Llama 3.2 1B Instruct	1B	128K	Edge deployment
Llama 3.2 3B Instruct	3B	128K	Mobile/edge
Llama 3.3 70B Instruct	70B	128K	Latest 70B model
Llama 3.2 Vision	Various	Various	Multimodal (image + text)

Providers

DeepInfra (Turbo) - Primary Provider

Attribute	Value
Provider Slug	`deepinfra/turbo`
Model ID at Provider	`meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo`
Base URL	`https://api.langmart.ai/v1/openai`
Region	US
Data Training	No
Prompt Retention	No
Terms	https://deepinfra.com/terms
Privacy	https://deepinfra.com/privacy

Pricing (via LangMart)

Standard Tier (DeepInfra Turbo)

Type	Price per Million Tokens
Input	$0.02
Output	$0.03

Quantization: FP8
Max Completion Tokens: 16,384

Free Tier

Type	Price per Million Tokens
Input	$0.00
Output	$0.00

Model ID: meta-llama/llama-3.1-8b-instruct:free
Limited rate/quota applies

Performance Benchmarks

Instruction-Tuned Model Performance

Category	Benchmark	Metric	Llama 3.1 8B
General	MMLU (5-shot)	macro_avg/acc	69.4%
	MMLU (CoT)	macro_avg/acc	73.0%
	IFEval	accuracy	80.4%
Reasoning	ARC-C	accuracy	83.4%
	GPQA	exact match	30.4%
	GPQA Diamond	exact match	31.8%
Code	HumanEval	pass@1	72.6%
	MBPP++	pass@1	72.8%
Math	GSM-8K (CoT)	exact match	84.5%
	MATH (CoT)	final_em	51.9%
Tool Use	API-Bank	accuracy	82.6%
	BFCL	accuracy	76.1%
	BFCL v2	accuracy	65.4%
Multilingual	MGSM (CoT)	exact match	68.9%

Comparison with Larger Models

Benchmark	Llama 3.1 8B	Llama 3.1 70B	Llama 3.1 405B
MMLU (CoT)	73.0%	86.0%	88.6%
IFEval	80.4%	87.5%	88.6%
HumanEval	72.6%	80.5%	89.0%
MATH (CoT)	51.9%	68.0%	73.8%
MGSM	68.9%	86.9%	91.6%

Hardware Requirements

Inference

Use device_map="auto" for automatic device placement:

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

Quantization Options

8-bit Quantization:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config
)

4-bit Quantization:

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

Tool Use

The model supports function calling with the following pattern:

def get_current_temperature(location: str) -> float:
    """Get the current temperature at a location.

    Args:
        location: The location (format: "City, Country")
    Returns:
        Temperature as a float
    """
    return 22.0

messages = [
    {"role": "system", "content": "You are a weather bot."},
    {"role": "user", "content": "What's the temperature in Paris?"}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tools=[get_current_temperature],
    add_generation_prompt=True
)

Tool Response Handling

# After model generates tool call
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

# Append tool result
messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})

License

License: Llama 3.1 Community License Agreement

Key Terms

Non-exclusive, worldwide, non-transferable, royalty-free limited license
Use, reproduce, distribute, copy, create derivative works
Modify the Llama Materials

Commercial Requirements

If monthly active users exceed 700M, you must request a license from Meta
Must include "Built with Llama" on related websites, user interfaces, and documentation
Must include "Llama" at the beginning of any AI model name built with Llama 3.1

Attribution Required

Llama 3.1 is licensed under the Llama 3.1 Community License,
Copyright Meta Platforms, Inc. All Rights Reserved.

Prohibited Uses

Violence, terrorism, and illegal activities
Child exploitation and abuse material
Human trafficking and sexual violence
Harassment and bullying
Discrimination in employment, credit, housing
Unauthorized professional practice (legal, medical, financial)
Malware and malicious code creation
Fraud, disinformation, and defamation
Impersonation and misrepresentation
Violations of ITAR, biological/chemical weapons regulations

Safety Considerations

Critical Risk Mitigation Areas

CBRNE Materials - Uplift testing to assess proliferation risks
Child Safety - Expert red teaming across supported languages
Cyber Attack Enablement - Hacking task capability evaluation

Recommended Safeguards

Tool	Purpose
Llama Guard 3	Input/output filtering
Prompt Guard	Prompt injection detection
Code Shield	Code security analysis

Multilinguality Caution

The model supports 8 languages with safety thresholds met. Use in non-supported languages is strongly discouraged without:

Fine-tuning
System controls aligned with use case policies

API Usage Examples

LangMart API

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Free Tier

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct:free",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

DeepInfra Direct

curl https://api.langmart.ai/v1/openai/chat/completions \
  -H "Authorization: Bearer $DEEPINFRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Python with Transformers

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Resources

Official Documentation

Issue Reporting

Issue Type	Contact
Model Issues	https://github.com/meta-llama/llama-models/issues
Risky Content	developers.facebook.com/llama_output_feedback
Security Bugs	facebook.com/whitehat/info
Policy Violations	LlamaUseReport@meta.com

Last updated: December 23, 2024 Source: LangMart, Hugging Face, Meta Model Card Verified: Data confirmed accurate via LangMart API scrape

Meta Llama 3.1 8B Instruct

Meta Llama 3.1 8B Instruct

Overview

Description

Supported Languages

Technical Specifications

Model Architecture

Training Data

Training Compute

Supported Parameters

Default Stop Tokens

Features

Related Models

Providers

DeepInfra (Turbo) - Primary Provider

Pricing (via LangMart)

Standard Tier (DeepInfra Turbo)

Free Tier

Performance Benchmarks

Instruction-Tuned Model Performance

Comparison with Larger Models

Hardware Requirements

Inference

Quantization Options

Tool Use

Tool Response Handling

License

Key Terms

Commercial Requirements

Attribution Required

Prohibited Uses

Safety Considerations

Critical Risk Mitigation Areas

Recommended Safeguards

Multilinguality Caution

API Usage Examples

LangMart API

Free Tier

DeepInfra Direct

Python with Transformers

Resources

Official Documentation

Issue Reporting