LangMart: Meta: Llama 4 Scout

Model Overview

Property	Value
Model ID	`openrouter/meta-llama/llama-4-scout`
Name	Meta: Llama 4 Scout
Provider	meta-llama
Released	2025-04-05

Description

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens.

Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Description

LangMart: Meta: Llama 4 Scout is a language model provided by meta-llama. This model offers advanced capabilities for natural language processing tasks.

Provider

meta-llama

Specifications

Spec	Value
Context Window	327,680 tokens
Modalities	text+image->text
Input Modalities	text, image
Output Modalities	text

Pricing

Type	Price
Input	$0.08 per 1M tokens
Output	$0.30 per 1M tokens

Capabilities

Frequency penalty
Logit bias
Max tokens
Min p
Presence penalty
Repetition penalty
Response format
Seed
Stop
Structured outputs
Temperature
Tool choice
Tools
Top k
Top p

Detailed Analysis

Llama 4 Scout on LangMart is Meta's efficiency-focused multimodal MoE model featuring 17B active parameters with 16 experts from 109B total parameters, specifically optimized for extreme context length (10M tokens - the longest available) and cost-effectiveness. Scout activates only 2 experts per token, making it highly efficient and capable of running on a single H100 GPU with Int4 quantization. This variant excels at memory-intensive workflows: long-document summarization, multi-file code analysis, forensic document review, and extensive data verification tasks. Pretrained on 40 trillion tokens with Interleaved RoPE architecture to reduce quadratic attention complexity at scale. Scout achieves ~99% accuracy on straightforward information extraction and 38.1% LiveCodeBench accuracy, but shows 45-70% performance on complex conditional logic tasks (vs Maverick's 85-92%). OpenRouter provides competitive pricing at $0.11-0.13/M input tokens (provider-dependent) with standard rate limits. The 16-expert architecture prioritizes throughput and cost over reasoning depth - ideal for document-heavy applications where context window matters more than nuanced logic. Choose OpenRouter's Scout tier when you need maximum 10M token context, single-GPU deployment efficiency, cost-effective processing of large document sets, or straightforward information extraction without complex reasoning requirements.