Yi 34B Chat Model Documentation
Model Overview
| Property | Value |
|---|---|
| Model Name | Yi 34B Chat |
| Inference Model ID | 01-ai/yi-34b-chat |
| Creator | 01.AI |
| Release Date | December 7, 2023 |
| Last Updated | November 10, 2025 |
| Status | Active and Available |
Description
The Yi series models are large language models trained from scratch by developers at 01.AI. This 34B parameter model has been instruct-tuned specifically for chat applications, providing optimized performance for conversational tasks and instruction-following.
Technical Specifications
Architecture & Parameters
- Total Parameters: 34 billion
- Model Type: Instruct-tuned Chat Model
- Training Approach: Trained from scratch by 01.AI
- Model Format: ChatML
Context & Input/Output
| Property | Details |
|---|---|
| Context Window | 4,096 tokens |
| Input Modalities | Text |
| Output Modalities | Text |
| Maximum Tokens | 4,096 (estimated) |
Stop Sequences
The model uses the following default stop sequences:
<|im_start|><|im_end|><|endoftext|>
Pricing
Pricing details are available through LangMart pricing API. Check the LangMart platform directly for current token rates (input/output pricing may vary by provider).
Use Cases
Ideal For
- General chat and conversational AI applications
- Customer service chatbots
- Q&A systems with conversational context
- Interactive tutoring and educational applications
- Content generation tasks
- Multi-turn dialogue systems
Not Recommended For
- Complex reasoning tasks requiring extended logic chains
- Tasks needing very long context windows (>4K tokens)
- Vision/multimodal tasks
- Real-time applications requiring very low latency
Integration with LangMart
To use this model in LangMart:
- Add
01-ai/yi-34b-chatto your provider connections - Configure LangMart API key
- Set context window to 4,096 tokens
- Use ChatML format for message formatting
- Handle stop sequences in response parsing
Model Description
The Yi series models are large language models trained from scratch by developers at 01.AI. This 34B parameter model has been instruct-tuned specifically for chat applications, providing optimized performance for conversational tasks and instruction-following.
Capabilities & Features
Supported Capabilities
- Chat Completion: Full support for multi-turn conversations
- Instruction Following: Optimized for chat-based instruction execution
- Text Generation: General purpose text generation
- Conversation Management: Designed for extended dialogue contexts
Limitations
- No Reasoning Features: Does not support advanced reasoning capabilities
- Context Limitation: 4,096 token context window may limit longer conversations
- Text-Only: Does not support image, audio, or other modalities
Model Weights & Implementation
- Official Weights: Available on Hugging Face
- Repository: 01-ai/Yi-34B-Chat
- Publicly Available: Yes - fully open-source model
Integration & Usage
LangMart API Usage
# Chat Completions
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "01-ai/yi-34b-chat",
"messages": [
{
"role": "user",
"content": "Hello! How are you today?"
}
],
"temperature": 0.7,
"max_tokens": 2048
}'
Python Example
import requests
response = requests.post(
"https://api.langmart.ai/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
},
json={
"model": "01-ai/yi-34b-chat",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain the concept of machine learning."
}
],
"temperature": 0.7,
"max_tokens": 2048,
"top_p": 0.9
}
)
print(response.json())
JavaScript/Node.js Example
const response = await fetch("https://api.langmart.ai/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "01-ai/yi-34b-chat",
messages: [
{
role: "user",
content: "What are the benefits of using LLMs?"
}
],
temperature: 0.7,
max_tokens: 2048
})
});
const data = await response.json();
console.log(data);
Recommended Parameters
| Parameter | Recommended Value | Range |
|---|---|---|
| temperature | 0.7 | 0.0 - 2.0 |
| top_p | 0.9 | 0.0 - 1.0 |
| max_tokens | 1024-2048 | 1 - 4096 |
| frequency_penalty | 0.0 | -2.0 - 2.0 |
| presence_penalty | 0.0 | -2.0 - 2.0 |
Comparison with Similar Models
| Model | Parameters | Context | Type |
|---|---|---|---|
| Yi 34B Chat | 34B | 4,096 | Chat-Optimized |
| Llama 2 34B | 34B | 4,096 | General |
| Mistral 34B | 34B | ~8,000 | General |
Model Availability
- LangMart Status: ✓ Available
- Direct Access: Yes (Hugging Face)
- Deprecation Status: None indicated
- Marketplace Visibility: Public and fully visible
Inference Considerations
Performance Expectations
- Typical Latency: ~1-3 seconds for standard queries (depends on provider)
- Throughput: Suitable for standard production workloads
- Memory Requirements: Approximately 64-80 GB VRAM for full model inference
- Quantization: Available in various bit formats (4-bit, 8-bit, etc.)
Best Practices
- Batching: For high-volume scenarios, consider batch processing
- Temperature Tuning: Adjust temperature based on desired creativity/determinism
- Token Budget: Plan for context + max_tokens to avoid truncation
- Stop Sequences: Ensure proper handling of ChatML stop tokens
- Error Handling: Implement retry logic for API rate limiting
Support & Documentation
For more information:
- Visit LangMart Model Documentation
- Check 01.AI Official Site
- Review Hugging Face Model Card
Additional Resources
- Model Card: Available on Hugging Face
- Training Documentation: 01.AI official documentation
- Community Discussion: Hugging Face model page
- API Documentation: LangMart API docs
Documentation generated from LangMart AI Model Registry Last updated: 2025-12-23