G

Groq: Llama 4 Scout 17B 16E Instruct

Groq
Streaming
131K
Context
$0.0500
Input /1M
$0.1500
Output /1M
8K
Max Output

Groq: Llama 4 Scout 17B 16E Instruct

Model Overview

Property Value
Model ID groq/meta-llama/llama-4-scout-17b-16e-instruct
Name Llama 4 Scout 17B 16E Instruct
Provider Groq / Meta
Parameters 17B

Description

Meta's Llama 4 Scout model with 17 billion parameters and 16 experts. A more efficient MoE variant designed for fast, lightweight inference on Groq infrastructure.

Specifications

Spec Value
Context Window 131,072 tokens
Max Completion 8,192 tokens
Inference Speed ~750 tokens/sec

Pricing

Type Price
Input $0.05 per 1M tokens
Output $0.15 per 1M tokens

Capabilities

  • Instruction Following: Yes
  • Fast Inference: Yes
  • Streaming: Yes
  • Efficient MoE: Yes

Use Cases

High-speed inference, cost-sensitive applications, real-time chat.

Integration with LangMart

Gateway Support: Type 2 (Cloud), Type 3 (Self-hosted)

API Usage:

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "groq/meta-llama/llama-4-scout-17b-16e-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 4096
  }'
  • groq/meta-llama/llama-4-maverick-17b-128e-instruct - Maverick variant
  • groq/llama-3.1-8b-instant - Fast 8B model

Last Updated: December 28, 2025