Yi 34B Chat Model Documentation

Model Overview

Property	Value
Model Name	Yi 34B Chat
Inference Model ID	`01-ai/yi-34b-chat`
Creator	01.AI
Release Date	December 7, 2023
Last Updated	November 10, 2025
Status	Active and Available

Description

The Yi series models are large language models trained from scratch by developers at 01.AI. This 34B parameter model has been instruct-tuned specifically for chat applications, providing optimized performance for conversational tasks and instruction-following.

Technical Specifications

Architecture & Parameters

Total Parameters: 34 billion
Model Type: Instruct-tuned Chat Model
Training Approach: Trained from scratch by 01.AI
Model Format: ChatML

Context & Input/Output

Property	Details
Context Window	4,096 tokens
Input Modalities	Text
Output Modalities	Text
Maximum Tokens	4,096 (estimated)

Stop Sequences

The model uses the following default stop sequences:

<|im_start|>
<|im_end|>
<|endoftext|>

Pricing

Pricing details are available through LangMart pricing API. Check the LangMart platform directly for current token rates (input/output pricing may vary by provider).

Use Cases

Ideal For

General chat and conversational AI applications
Customer service chatbots
Q&A systems with conversational context
Interactive tutoring and educational applications
Content generation tasks
Multi-turn dialogue systems

Not Recommended For

Complex reasoning tasks requiring extended logic chains
Tasks needing very long context windows (>4K tokens)
Vision/multimodal tasks
Real-time applications requiring very low latency

Integration with LangMart

To use this model in LangMart:

Add 01-ai/yi-34b-chat to your provider connections
Configure LangMart API key
Set context window to 4,096 tokens
Use ChatML format for message formatting
Handle stop sequences in response parsing

Model Description

Capabilities & Features

Supported Capabilities

Chat Completion: Full support for multi-turn conversations
Instruction Following: Optimized for chat-based instruction execution
Text Generation: General purpose text generation
Conversation Management: Designed for extended dialogue contexts

Limitations

No Reasoning Features: Does not support advanced reasoning capabilities
Context Limitation: 4,096 token context window may limit longer conversations
Text-Only: Does not support image, audio, or other modalities

Model Weights & Implementation

Official Weights: Available on Hugging Face
Repository: 01-ai/Yi-34B-Chat
Publicly Available: Yes - fully open-source model

Integration & Usage

LangMart API Usage

# Chat Completions
curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "01-ai/yi-34b-chat",
    "messages": [
      {
        "role": "user",
        "content": "Hello! How are you today?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'

Python Example

import requests

response = requests.post(
    "https://api.langmart.ai/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    },
    json={
        "model": "01-ai/yi-34b-chat",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Explain the concept of machine learning."
            }
        ],
        "temperature": 0.7,
        "max_tokens": 2048,
        "top_p": 0.9
    }
)

print(response.json())

JavaScript/Node.js Example

const response = await fetch("https://api.langmart.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "01-ai/yi-34b-chat",
    messages: [
      {
        role: "user",
        content: "What are the benefits of using LLMs?"
      }
    ],
    temperature: 0.7,
    max_tokens: 2048
  })
});

const data = await response.json();
console.log(data);

Recommended Parameters

Parameter	Recommended Value	Range
temperature	0.7	0.0 - 2.0
top_p	0.9	0.0 - 1.0
max_tokens	1024-2048	1 - 4096
frequency_penalty	0.0	-2.0 - 2.0
presence_penalty	0.0	-2.0 - 2.0

Comparison with Similar Models

Model	Parameters	Context	Type
Yi 34B Chat	34B	4,096	Chat-Optimized
Llama 2 34B	34B	4,096	General
Mistral 34B	34B	~8,000	General

Model Availability

LangMart Status: ✓ Available
Direct Access: Yes (Hugging Face)
Deprecation Status: None indicated
Marketplace Visibility: Public and fully visible

Inference Considerations

Performance Expectations

Typical Latency: ~1-3 seconds for standard queries (depends on provider)
Throughput: Suitable for standard production workloads
Memory Requirements: Approximately 64-80 GB VRAM for full model inference
Quantization: Available in various bit formats (4-bit, 8-bit, etc.)

Best Practices

Batching: For high-volume scenarios, consider batch processing
Temperature Tuning: Adjust temperature based on desired creativity/determinism
Token Budget: Plan for context + max_tokens to avoid truncation
Stop Sequences: Ensure proper handling of ChatML stop tokens
Error Handling: Implement retry logic for API rate limiting

Support & Documentation

For more information:

Additional Resources

Model Card: Available on Hugging Face
Training Documentation: 01.AI official documentation
Community Discussion: Hugging Face model page
API Documentation: LangMart API docs

Documentation generated from LangMart AI Model Registry Last updated: 2025-12-23