Source: LangMart
Overview
| Property |
Value |
| Model ID |
meta-llama/llama-4-maverick |
| Full Name |
Meta: Llama 4 Maverick |
| Short Name |
Llama 4 Maverick |
| Author |
meta-llama |
| Release Date |
April 5, 2025 |
| License |
Llama 4 Community License |
Description
A high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total parameters). The model supports multilingual text and image inputs, producing text and code outputs across 12 languages.
Key features:
- Early fusion for native multimodality, enabling seamless vision-language integration
- Instruction-tuned for assistant-like behavior and vision-language tasks
- Trained on approximately 22 trillion tokens from public, licensed, and Meta-platform sources
- Knowledge cutoff: August 2024
- Released under the Llama 4 Community License for both research and commercial use
Technical Specifications
| Specification |
Value |
| Architecture |
Mixture of Experts (MoE) |
| Total Parameters |
400B |
| Active Parameters |
17B per forward pass |
| Number of Experts |
128 |
| Context Length |
1,048,576 tokens |
| Max Completion Tokens |
16,384 |
| Training Data |
~22 trillion tokens |
| Knowledge Cutoff |
August 2024 |
Pricing
| Type |
Cost (per million tokens) |
| Input |
$0.15 |
| Output |
$0.60 |
| Image |
$0.0006684 per image |
Capabilities
| Capability |
Supported |
| Text Generation |
Yes |
| Vision/Image Understanding |
Yes |
| Tool Calling |
Yes |
| Structured Outputs |
Yes |
| Trainable (Text) |
Yes |
| Reasoning |
No |
Supported Parameters
| Parameter |
Description |
| temperature |
Controls randomness in generation |
| top_p |
Nucleus sampling threshold |
| top_k |
Top-k sampling |
| max_tokens |
Maximum tokens to generate |
| stop |
Stop sequences |
| frequency_penalty |
Penalize frequent tokens |
| presence_penalty |
Penalize tokens already present |
| repetition_penalty |
Alternative repetition penalty |
| seed |
Random seed for reproducibility |
| min_p |
Minimum probability threshold |
| response_format |
Structured output format |
Structured Outputs
This model supports structured outputs and response formatting for JSON mode and other structured generation tasks.
Other models in the Llama 4 family:
meta-llama/llama-4-scout - Smaller, faster variant
meta-llama/llama-4-maverick:free - Free tier version (if available)
Providers
DeepInfra (Primary)
| Property |
Value |
| Provider Model ID |
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 |
| Quantization |
FP8 |
| Context Window |
1,048,576 tokens |
| Max Completion Tokens |
16,384 |
| Max Tokens Per Image |
3,342 |
Modalities
| Type |
Support |
| Input - Text |
Yes |
| Input - Image |
Yes |
| Output - Text |
Yes |
| Output - Image |
No |
Image Processing
- Max Tokens Per Image: 3,342
- Image Price: $0.0006684 per image
Supported Languages
The model supports 12 languages (specific language list not detailed in source).
Usage
API Endpoint
POST https://api.langmart.ai/v1/chat/completions
Example Request
{
"model": "meta-llama/llama-4-maverick",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
]
}
{
"model": "meta-llama/llama-4-maverick",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}
Links
Notes
- This is a Mixture of Experts model, meaning only 17B parameters are active during inference despite having 400B total parameters
- The 1M+ token context window makes it suitable for very long document processing
- Native multimodality through early fusion provides superior vision-language understanding compared to adapter-based approaches
- The FP8 quantization maintains good quality while improving inference efficiency