Qwen: Qwen3 235B A22B
Inference Model ID: qwen/qwen3-235b-a22b
Overview
| Property |
Value |
| Provider |
Qwen (Alibaba Cloud) |
| Model ID |
qwen/qwen3-235b-a22b |
| Permaslug |
qwen/qwen3-235b-a22b-04-28 |
| Created |
April 28, 2025 |
| Context Length |
40,960 tokens (native 32K, extends to 131,072 with YaRN) |
| Max Completion Tokens |
40,960 |
| Input Modalities |
Text |
| Output Modalities |
Text |
Description
Qwen3-235B-A22B is a 235 billion parameter mixture-of-experts (MoE) model developed by Qwen (Alibaba Cloud), activating 22 billion parameters per forward pass. This architecture allows the model to deliver exceptional performance while maintaining computational efficiency.
The model supports seamless switching between two operational modes:
- "Thinking" Mode: For complex reasoning, mathematics, and code tasks - uses explicit reasoning tokens (
<think> and </think>)
- "Non-thinking" Mode: For general conversational efficiency with faster response times
Key characteristics:
- Strong Reasoning Ability: Excels at mathematical problem-solving and complex logical tasks
- Multilingual Support: Supports 100+ languages and dialects
- Advanced Instruction-Following: High accuracy in following complex instructions
- Agent Tool-Calling: Native support for tool/function calling capabilities
- Extended Context: Native 32K context window, extendable to 131K tokens using YaRN-based scaling
Pricing
| Type |
Price per Million |
| Input Tokens |
$0.18 |
| Output Tokens |
$0.54 |
| Request Fee |
$0.00 |
Cost Comparison
- This model offers competitive pricing for a 235B parameter MoE model
- The MoE architecture (22B active parameters) enables lower per-token costs compared to dense models of similar capability
Supported Parameters
| Parameter |
Type |
Description |
reasoning |
boolean |
Enable reasoning mode with explicit thinking |
include_reasoning |
boolean |
Include reasoning tokens in response |
max_tokens |
integer |
Maximum number of tokens to generate |
temperature |
float |
Sampling temperature (0-2) |
top_p |
float |
Nucleus sampling probability |
top_k |
integer |
Top-k sampling parameter |
stop |
array |
Stop sequences |
frequency_penalty |
float |
Frequency penalty for token repetition |
presence_penalty |
float |
Presence penalty for new topics |
repetition_penalty |
float |
Repetition penalty factor |
seed |
integer |
Seed for reproducible outputs |
min_p |
float |
Minimum probability threshold |
response_format |
object |
Format specification for the response |
tools |
array |
List of tools available to the model |
tool_choice |
string/object |
Tool selection mode: auto, none, or specific tool |
Default Configuration
- Default Stop Tokens:
<|im_start|>, <|im_end|>
- Instruction Type: Qwen3
Features
- Reasoning Tokens: Uses
<think> and </think> tokens for explicit reasoning
- Tool Calling: Native support for function/tool calling
- Prompt Caching: Supported for repeated requests
- Multipart Input: Supports multipart message format
Architecture Details
| Specification |
Value |
| Total Parameters |
235 billion |
| Active Parameters |
22 billion per forward pass |
| Architecture |
Mixture-of-Experts (MoE) |
| Native Context |
32,768 tokens |
| Extended Context |
131,072 tokens (via YaRN scaling) |
Recent Usage Statistics
| Metric |
Value |
| Peak Daily Requests |
~150,000+ (December 16, 2025) |
| Reasoning Token Generation |
20-40M tokens daily |
| Tool Call Error Rate |
~0.05% |
Capabilities
- Reasoning: Strong performance on mathematical and logical reasoning benchmarks
- Code Generation: Proficient in multiple programming languages
- Multilingual: Supports 100+ languages and dialects
- Long Context: Effective utilization of extended context windows
API Usage Example
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-235b-a22b",
"messages": [
{"role": "user", "content": "Explain the concept of mixture-of-experts architecture in neural networks."}
],
"max_tokens": 2048,
"temperature": 0.7
}'
Reasoning Mode Example
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-235b-a22b",
"messages": [
{"role": "user", "content": "Solve this step by step: If a train travels 120 km in 2 hours, then stops for 30 minutes, then travels 90 km in 1.5 hours, what is the average speed for the entire journey?"}
],
"reasoning": true,
"include_reasoning": true,
"max_tokens": 4096
}'
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-235b-a22b",
"messages": [
{"role": "user", "content": "What is the weather in Tokyo?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
Qwen3 Model Family
| Model ID |
Parameters |
Active Params |
Description |
qwen/qwen3-235b-a22b |
235B |
22B |
Flagship MoE model |
qwen/qwen3-32b |
32B |
32B |
Dense model variant |
qwen/qwen3-14b |
14B |
14B |
Mid-size dense model |
qwen/qwen3-7b |
7B |
7B |
Lightweight model |
qwen/qwen3-4b |
4B |
4B |
Compact model |
Similar MoE Models
| Model ID |
Description |
deepseek/deepseek-v3 |
DeepSeek V3 - MoE architecture |
mistralai/mixtral-8x22b |
Mixtral 8x22B - 8 expert MoE |
databricks/dbrx-instruct |
DBRX - MoE model |
Providers
Primary Provider: DeepInfra
| Property |
Value |
| Provider |
DeepInfra |
| Quantization |
FP8 |
| Context Length |
40,960 tokens |
| Max Completion Tokens |
40,960 |
| Tool Support |
Yes |
| Multipart Support |
Yes |
| Reasoning Support |
Yes |
Notes
- This model is part of Qwen's third-generation model family released in April 2025
- The MoE architecture (235B total, 22B active) provides an excellent balance of capability and efficiency
- Supports both "thinking" mode for complex tasks and "non-thinking" mode for quick responses
- YaRN-based context extension allows handling documents up to 131K tokens
- Currently available through DeepInfra with FP8 quantization
- Native support for tool calling makes it suitable for agent-based applications
Source: LangMart Model Registry
Last Updated: December 23, 2025