Groq: DeepSeek R1 Distill Llama 70B

Model Overview

Property	Value
Model ID	`groq/deepseek-r1-distill-llama-70b`
Name	DeepSeek R1 Distill Llama 70B
Parameters	70B

Description

Groq: DeepSeek R1 Distill Llama 70B is a language model provided by the provider. This model offers advanced capabilities for natural language processing tasks.

Specifications

Spec	Value
Context Window	131K tokens
Max Completion	8K tokens
Inference Speed	270 tokens/second

Pricing

Type	Price
Input	$0.59 per 1M tokens
Output	$0.79 per 1M tokens

Capabilities

Fast inference engine (Groq LPU)
Cost-effective token processing
Reliable production performance
Streaming support

Limitations

131K token context window
Maximum completion tokens: 8K
No image generation (inference only)

Performance

Groq specializes in rapid inference with industry-leading token throughput. Typical use cases include:

Real-time chat applications
Batch processing with guaranteed latency
High-volume inference workloads
Cost-sensitive deployments

Best Practices

Token Optimization: Craft prompts to minimize token usage while maintaining quality
Streaming: Use streaming responses for real-time applications
Batch Processing: Leverage high TPM limits for batch inference
Context Management: Utilize full context window for complex tasks

Rate Limits

30000 TPM (Tokens Per Minute)
Optimized for high-throughput inference

Features

High-speed token generation (270 tokens/sec)
131K token context window
Suitable for: Distilled reasoning, multi-step problem solving

Integration

Use the standard OpenAI-compatible API endpoint:

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "groq/deepseek-r1-distill-llama-70b",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Resources

Last updated: December 2025 Source: Groq Official Documentation