G

Groq: Whisper Large V3 Turbo

Groq
Streaming Vision
128K
Context
$0.0400
Input /1M
Free
Output /1M
8K
Max Output

Groq: Whisper Large V3 Turbo

Model Overview

Property Value
Model ID groq/whisper-large-v3-turbo
Name Whisper Large V3 Turbo
Parameters 330M

Description

Groq: Whisper Large V3 Turbo is a language model provided by the provider. This model offers advanced capabilities for natural language processing tasks.

Specifications

Spec Value
Context Window 128K tokens
Max Completion 8K tokens
Inference Speed 0 tokens/second

Pricing

Type Price
Input $0.04 per 1M tokens
Output $0.00 per 1M tokens

Capabilities

  • Fast inference engine (Groq LPU)
  • Cost-effective token processing
  • Reliable production performance
  • Streaming support

Limitations

  • 128K token context window
  • Maximum completion tokens: 8K
  • No image generation (inference only)

Performance

Groq specializes in rapid inference with industry-leading token throughput. Typical use cases include:

  • Real-time chat applications
  • Batch processing with guaranteed latency
  • High-volume inference workloads
  • Cost-sensitive deployments

Detailed Analysis

Whisper Large v3 Turbo is a highly optimized ASR model created by pruning and fine-tuning the standard Whisper Large v3. It reduces the parameter count by nearly 50% (from 1.55B to 809M) and slashes decoder layers from 32 to just 4, achieving an impressive 8x speed boost over standard Large v3. On Groq's LPU infrastructure, Turbo reaches 216x real-time speed—15% faster than the standard v3 (189x)—while maintaining a competitive WER of 12% (vs 10% for standard). OpenAI achieved this by fine-tuning for two additional epochs over the same multilingual transcription data, excluding translation data. The model excels at multilingual transcription with tied-best WER in languages like French, though it shows larger degradation in Thai and Cantonese. Priced at just /bin/bash.04 per hour, Turbo offers exceptional value. Choose Turbo when speed and cost-efficiency are priorities and you can accept a 2% accuracy trade-off. Ideal for real-time transcription services, live captioning, high-volume batch processing, customer support call analysis, and interactive voice applications where sub-second latency matters. Important limitation: Turbo is NOT trained for translation tasks—use standard Large v3 for non-English to English translation.

Best Practices

  1. Token Optimization: Craft prompts to minimize token usage while maintaining quality
  2. Streaming: Use streaming responses for real-time applications
  3. Batch Processing: Leverage high TPM limits for batch inference
  4. Context Management: Utilize full context window for complex tasks

Rate Limits

  • 30000 TPM (Tokens Per Minute)
  • Optimized for high-throughput inference

Features

  • High-speed token generation (0 tokens/sec)
  • 128K token context window
  • Suitable for: Speech-to-text, audio transcription, multilingual audio

Integration

Use the standard OpenAI-compatible API endpoint:

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "groq/whisper-large-v3-turbo",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Resources


Last updated: December 2025 Source: Groq Official Documentation