Baidu: ERNIE 4.5 21B A3B Thinking
Model ID: baidu/ernie-4.5-21b-a3b-thinking
Provider: Baidu (via NovitaAI)
Category: Reasoning Model, Multilingual
Release Date: October 9, 2025
Parameters: 21 billion
Overview
ERNIE-4.5-21B-A3B-Thinking is Baidu's upgraded lightweight MoE (Mixture of Experts) model refined to boost reasoning depth and quality. It delivers top-tier performance in logical puzzles, math, science, coding, text generation, and expert-level academic benchmarks. The model excels across multilingual reasoning tasks.
Technical Specifications
| Property | Value |
|---|---|
| Parameters | 21 billion |
| Architecture | Mixture of Experts (MoE) |
| Context Length | 131,072 tokens |
| Input Modalities | Text |
| Output Modalities | Text |
| Max Completion Tokens | 65,536 |
| Reasoning Format | <think></think> tags |
Pricing
Via NovitaAI Provider:
| Type | Price |
|---|---|
| Input | $0.06 per 1M tokens |
| Output | $0.22 per 1M tokens |
Quantization: Standard
Capabilities
Reasoning & Analysis
- Deep reasoning with thinking tokens
- Logical puzzle solving
- Mathematical problem solving
- Scientific reasoning
- Code understanding and generation
- Text generation and summarization
- Expert-level academic benchmarks
Input Modalities
- Text only
Output Modalities
- Text only
Key Features
- Reasoning support with configurable depth
- Reasoning output wrapped in
<think></think>tags - Mixture of Experts architecture for efficiency
- Lightweight 21B parameters
- Extended context window (131,072 tokens)
- Structured outputs support
Supported Parameters
reasoning- Enable/configure reasoning modeinclude_reasoning- Include reasoning in outputmax_tokens- Maximum output tokenstemperature- Sampling temperature controltop_p- Nucleus sampling parameterstop- Stop sequences for output terminationfrequency_penalty- Reduce repetitive tokenspresence_penalty- Encourage diverse contentseed- Random seed for reproducibilitytop_k- Top-K sampling parameterrepetition_penalty- Control repetition
Use Cases
- Mathematical Problem Solving: Complex math and logic puzzles
- Scientific Analysis: Scientific reasoning and research support
- Code Generation: Programming and code understanding
- Academic Research: Expert-level analysis and writing
- Multilingual Reasoning: Reasoning across multiple languages
- Technical Writing: Documentation and technical content
- Logic Puzzles: Complex logical reasoning challenges
- Coding Interviews: Preparation and problem solving
Limitations
- Text Only: No image or video understanding
- Reasoning Output: Must enable reasoning explicitly
- Moderate Context: 131K tokens (smaller than some alternatives)
Best Practices
- Enable Reasoning: Always enable reasoning for complex tasks
- Leverage MoE: Architecture is optimized for efficiency
- Temperature Setting: Lower temperature (0.3-0.7) for logical tasks
- Context Usage: Use full 131K token context for long documents
- Multilingual: Good for applications requiring cross-language reasoning
Related Models
- DeepSeek: DeepSeek V3.2 - Alternative reasoning model
- AllenAI: Olmo 3.1 32B Think - Free reasoning alternative
- Google: Gemini 3 Flash Preview - Multimodal reasoning
- Anthropic: Claude 3.5 Sonnet - Alternative reasoning model
Performance Metrics
Benchmark Performance
- Top-tier performance on:
- Logical Puzzles: Advanced logic and reasoning tasks
- Mathematics: Complex mathematical problem solving
- Science: Scientific reasoning and analysis
- Coding: Code generation and understanding
- Academic Tasks: Expert-level academic benchmarks
Usage Statistics (Recent)
- Consistent daily usage patterns
- Variable request volumes indicating adoption for reasoning-intensive tasks
- Strong performance across multilingual inputs
Provider Information
Primary Provider: NovitaAI
- Adapter: NovitaAdapter
- Base URL: https://api.novita.ai/v3/openai
- Quantization: Standard
- Max Completion Tokens: 65,536
Data Policy
- Training Use: Not used for model training
- Prompt Retention: Prompts not retained
- Publishing: Cannot publish outputs without permission
Multilingual Support
ERNIE 4.5 excels at reasoning across multiple languages:
- Chinese: Native performance
- English: Proficient
- Other Languages: Strong multilingual reasoning
Advantages
Efficiency
- 21B Parameters: Lightweight compared to larger reasoning models
- MoE Architecture: Efficient expert routing
- Cost-Effective: Competitive pricing for reasoning capability
Performance
- Expert-Level Benchmarks: Top performance on academic tasks
- Reasoning Depth: Strong logical and mathematical reasoning
- Multilingual: Excellent cross-language reasoning
Value
- Balanced Pricing: Mid-range cost for strong reasoning
- Versatile: Works well across multiple domains
Comparison with Alternatives
| Model | Parameters | Reasoning | Multimodal | Cost (Input) |
|---|---|---|---|---|
| ERNIE 4.5 21B | 21B | Yes | No | $0.056/M |
| DeepSeek V3.2 | Large | Yes | No | $0.224/M |
| Gemini 3 Flash | Unknown | Yes | Yes | $0.50/M |
| Claude 3.5 Sonnet | 200B+ | Yes | Yes | $3/M |
Output Format
Reasoning Output
When reasoning is enabled, the model returns output in this format:
<think>
[Internal reasoning chain and logic]
</think>
[Final answer based on reasoning]
Additional Notes
- Baidu Research: Backed by Baidu's strong AI research capabilities
- Lightweight Design: 21B parameters make it suitable for varied deployments
- Competitive Pricing: Excellent value for reasoning capabilities
- MoE Benefits: Mixture of Experts provides capability without size penalty
- Emerging Strong: Growing adoption for reasoning-focused applications
- Quality Assurance: Expert-level benchmark performance validates quality
Training & Data
- Trained on 9 trillion tokens (inherited from ERNIE base)
- Focus on reasoning and instruction following
- Academic and technical domain emphasis
- Multilingual training dataset
Recommendation Scenarios
Ideal for:
- Cost-conscious reasoning applications
- Multilingual reasoning tasks
- Academic and research applications
- Code generation and analysis
- Technical problem solving
Consider alternatives for:
- Vision/multimodal requirements
- Maximum reasoning depth (use larger models)
- Real-time high-volume applications (check rate limits)