LangMart: Meta: Llama 4 Scout
Model Overview
| Property | Value |
|---|---|
| Model ID | openrouter/meta-llama/llama-4-scout |
| Name | Meta: Llama 4 Scout |
| Provider | meta-llama |
| Released | 2025-04-05 |
Description
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens.
Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.
Description
LangMart: Meta: Llama 4 Scout is a language model provided by meta-llama. This model offers advanced capabilities for natural language processing tasks.
Provider
meta-llama
Specifications
| Spec | Value |
|---|---|
| Context Window | 327,680 tokens |
| Modalities | text+image->text |
| Input Modalities | text, image |
| Output Modalities | text |
Pricing
| Type | Price |
|---|---|
| Input | $0.08 per 1M tokens |
| Output | $0.30 per 1M tokens |
Capabilities
- Frequency penalty
- Logit bias
- Max tokens
- Min p
- Presence penalty
- Repetition penalty
- Response format
- Seed
- Stop
- Structured outputs
- Temperature
- Tool choice
- Tools
- Top k
- Top p
Detailed Analysis
Llama 4 Scout on LangMart is Meta's efficiency-focused multimodal MoE model featuring 17B active parameters with 16 experts from 109B total parameters, specifically optimized for extreme context length (10M tokens - the longest available) and cost-effectiveness. Scout activates only 2 experts per token, making it highly efficient and capable of running on a single H100 GPU with Int4 quantization. This variant excels at memory-intensive workflows: long-document summarization, multi-file code analysis, forensic document review, and extensive data verification tasks. Pretrained on 40 trillion tokens with Interleaved RoPE architecture to reduce quadratic attention complexity at scale. Scout achieves ~99% accuracy on straightforward information extraction and 38.1% LiveCodeBench accuracy, but shows 45-70% performance on complex conditional logic tasks (vs Maverick's 85-92%). OpenRouter provides competitive pricing at $0.11-0.13/M input tokens (provider-dependent) with standard rate limits. The 16-expert architecture prioritizes throughput and cost over reasoning depth - ideal for document-heavy applications where context window matters more than nuanced logic. Choose OpenRouter's Scout tier when you need maximum 10M token context, single-GPU deployment efficiency, cost-effective processing of large document sets, or straightforward information extraction without complex reasoning requirements.