M

Meta: Llama 4 Maverick

Meta
Vision Tools
1M
Context
$0.1500
Input /1M
$0.6000
Output /1M
16K
Max Output

Meta: Llama 4 Maverick

Source: LangMart

Overview

Property Value
Model ID meta-llama/llama-4-maverick
Full Name Meta: Llama 4 Maverick
Short Name Llama 4 Maverick
Author meta-llama
Release Date April 5, 2025
License Llama 4 Community License

Description

A high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total parameters). The model supports multilingual text and image inputs, producing text and code outputs across 12 languages.

Key features:

  • Early fusion for native multimodality, enabling seamless vision-language integration
  • Instruction-tuned for assistant-like behavior and vision-language tasks
  • Trained on approximately 22 trillion tokens from public, licensed, and Meta-platform sources
  • Knowledge cutoff: August 2024
  • Released under the Llama 4 Community License for both research and commercial use

Technical Specifications

Specification Value
Architecture Mixture of Experts (MoE)
Total Parameters 400B
Active Parameters 17B per forward pass
Number of Experts 128
Context Length 1,048,576 tokens
Max Completion Tokens 16,384
Training Data ~22 trillion tokens
Knowledge Cutoff August 2024

Pricing

Type Cost (per million tokens)
Input $0.15
Output $0.60
Image $0.0006684 per image

Capabilities

Capability Supported
Text Generation Yes
Vision/Image Understanding Yes
Tool Calling Yes
Structured Outputs Yes
Trainable (Text) Yes
Reasoning No

Supported Parameters

Parameter Description
temperature Controls randomness in generation
top_p Nucleus sampling threshold
top_k Top-k sampling
max_tokens Maximum tokens to generate
stop Stop sequences
frequency_penalty Penalize frequent tokens
presence_penalty Penalize tokens already present
repetition_penalty Alternative repetition penalty
seed Random seed for reproducibility
min_p Minimum probability threshold
response_format Structured output format

Structured Outputs

This model supports structured outputs and response formatting for JSON mode and other structured generation tasks.

Other models in the Llama 4 family:

  • meta-llama/llama-4-scout - Smaller, faster variant
  • meta-llama/llama-4-maverick:free - Free tier version (if available)

Providers

DeepInfra (Primary)

Property Value
Provider Model ID meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
Quantization FP8
Context Window 1,048,576 tokens
Max Completion Tokens 16,384
Max Tokens Per Image 3,342

Modalities

Type Support
Input - Text Yes
Input - Image Yes
Output - Text Yes
Output - Image No

Image Processing

  • Max Tokens Per Image: 3,342
  • Image Price: $0.0006684 per image

Supported Languages

The model supports 12 languages (specific language list not detailed in source).

Usage

API Endpoint

POST https://api.langmart.ai/v1/chat/completions

Example Request

{
  "model": "meta-llama/llama-4-maverick",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}

With Image Input

{
  "model": "meta-llama/llama-4-maverick",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Notes

  • This is a Mixture of Experts model, meaning only 17B parameters are active during inference despite having 400B total parameters
  • The 1M+ token context window makes it suitable for very long document processing
  • Native multimodality through early fusion provides superior vision-language understanding compared to adapter-based approaches
  • The FP8 quantization maintains good quality while improving inference efficiency