O

LangMart: Meta: Llama 4 Scout

Openrouter
Vision
328K
Context
$0.0800
Input /1M
$0.3000
Output /1M
N/A
Max Output

LangMart: Meta: Llama 4 Scout

Model Overview

Property Value
Model ID openrouter/meta-llama/llama-4-scout
Name Meta: Llama 4 Scout
Provider meta-llama
Released 2025-04-05

Description

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens.

Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Description

LangMart: Meta: Llama 4 Scout is a language model provided by meta-llama. This model offers advanced capabilities for natural language processing tasks.

Provider

meta-llama

Specifications

Spec Value
Context Window 327,680 tokens
Modalities text+image->text
Input Modalities text, image
Output Modalities text

Pricing

Type Price
Input $0.08 per 1M tokens
Output $0.30 per 1M tokens

Capabilities

  • Frequency penalty
  • Logit bias
  • Max tokens
  • Min p
  • Presence penalty
  • Repetition penalty
  • Response format
  • Seed
  • Stop
  • Structured outputs
  • Temperature
  • Tool choice
  • Tools
  • Top k
  • Top p

Detailed Analysis

Llama 4 Scout on LangMart is Meta's efficiency-focused multimodal MoE model featuring 17B active parameters with 16 experts from 109B total parameters, specifically optimized for extreme context length (10M tokens - the longest available) and cost-effectiveness. Scout activates only 2 experts per token, making it highly efficient and capable of running on a single H100 GPU with Int4 quantization. This variant excels at memory-intensive workflows: long-document summarization, multi-file code analysis, forensic document review, and extensive data verification tasks. Pretrained on 40 trillion tokens with Interleaved RoPE architecture to reduce quadratic attention complexity at scale. Scout achieves ~99% accuracy on straightforward information extraction and 38.1% LiveCodeBench accuracy, but shows 45-70% performance on complex conditional logic tasks (vs Maverick's 85-92%). OpenRouter provides competitive pricing at $0.11-0.13/M input tokens (provider-dependent) with standard rate limits. The 16-expert architecture prioritizes throughput and cost over reasoning depth - ideal for document-heavy applications where context window matters more than nuanced logic. Choose OpenRouter's Scout tier when you need maximum 10M token context, single-GPU deployment efficiency, cost-effective processing of large document sets, or straightforward information extraction without complex reasoning requirements.