Groq: Qwen 2 7B 4-bit
Model Overview
| Property | Value |
|---|---|
| Model ID | groq/qwen2-7b-4bit |
| Name | Qwen 2 7B 4-bit |
| Parameters | 7B |
Description
Groq: Qwen 2 7B 4-bit is a language model provided by the provider. This model offers advanced capabilities for natural language processing tasks.
Specifications
| Spec | Value |
|---|---|
| Context Window | 32K tokens |
| Max Completion | 8K tokens |
| Inference Speed | 350 tokens/second |
Pricing
| Type | Price |
|---|---|
| Input | $0.02 per 1M tokens |
| Output | $0.02 per 1M tokens |
Capabilities
- Fast inference engine (Groq LPU)
- Cost-effective token processing
- Reliable production performance
- Streaming support
Limitations
- 32K token context window
- Maximum completion tokens: 8K
- No image generation (inference only)
Performance
Groq specializes in rapid inference with industry-leading token throughput. Typical use cases include:
- Real-time chat applications
- Batch processing with guaranteed latency
- High-volume inference workloads
- Cost-sensitive deployments
Best Practices
- Token Optimization: Craft prompts to minimize token usage while maintaining quality
- Streaming: Use streaming responses for real-time applications
- Batch Processing: Leverage high TPM limits for batch inference
- Context Management: Utilize full context window for complex tasks
Rate Limits
- 30000 TPM (Tokens Per Minute)
- Optimized for high-throughput inference
Features
- High-speed token generation (350 tokens/sec)
- 32K token context window
- Suitable for: Lightweight Qwen, efficient inference
Integration
Use the standard OpenAI-compatible API endpoint:
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "groq/qwen2-7b-4bit",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Resources
Last updated: December 2025 Source: Groq Official Documentation