Groq: Whisper Large V3 Turbo

Model Overview

Property	Value
Model ID	`groq/whisper-large-v3-turbo`
Name	Whisper Large V3 Turbo
Parameters	330M

Description

Groq: Whisper Large V3 Turbo is a language model provided by the provider. This model offers advanced capabilities for natural language processing tasks.

Specifications

Spec	Value
Context Window	128K tokens
Max Completion	8K tokens
Inference Speed	0 tokens/second

Pricing

Type	Price
Input	$0.04 per 1M tokens
Output	$0.00 per 1M tokens

Capabilities

Fast inference engine (Groq LPU)
Cost-effective token processing
Reliable production performance
Streaming support

Limitations

128K token context window
Maximum completion tokens: 8K
No image generation (inference only)

Performance

Groq specializes in rapid inference with industry-leading token throughput. Typical use cases include:

Real-time chat applications
Batch processing with guaranteed latency
High-volume inference workloads
Cost-sensitive deployments

Detailed Analysis

Whisper Large v3 Turbo is a highly optimized ASR model created by pruning and fine-tuning the standard Whisper Large v3. It reduces the parameter count by nearly 50% (from 1.55B to 809M) and slashes decoder layers from 32 to just 4, achieving an impressive 8x speed boost over standard Large v3. On Groq's LPU infrastructure, Turbo reaches 216x real-time speed—15% faster than the standard v3 (189x)—while maintaining a competitive WER of 12% (vs 10% for standard). OpenAI achieved this by fine-tuning for two additional epochs over the same multilingual transcription data, excluding translation data. The model excels at multilingual transcription with tied-best WER in languages like French, though it shows larger degradation in Thai and Cantonese. Priced at just /bin/bash.04 per hour, Turbo offers exceptional value. Choose Turbo when speed and cost-efficiency are priorities and you can accept a 2% accuracy trade-off. Ideal for real-time transcription services, live captioning, high-volume batch processing, customer support call analysis, and interactive voice applications where sub-second latency matters. Important limitation: Turbo is NOT trained for translation tasks—use standard Large v3 for non-English to English translation.

Best Practices

Token Optimization: Craft prompts to minimize token usage while maintaining quality
Streaming: Use streaming responses for real-time applications
Batch Processing: Leverage high TPM limits for batch inference
Context Management: Utilize full context window for complex tasks

Rate Limits

30000 TPM (Tokens Per Minute)
Optimized for high-throughput inference

Features

High-speed token generation (0 tokens/sec)
128K token context window
Suitable for: Speech-to-text, audio transcription, multilingual audio

Integration

Use the standard OpenAI-compatible API endpoint:

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "groq/whisper-large-v3-turbo",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Resources

Last updated: December 2025 Source: Groq Official Documentation