Qwen: Qwen3 235B A22B

Inference Model ID: qwen/qwen3-235b-a22b

Overview

Property	Value
Provider	Qwen (Alibaba Cloud)
Model ID	`qwen/qwen3-235b-a22b`
Permaslug	`qwen/qwen3-235b-a22b-04-28`
Created	April 28, 2025
Context Length	40,960 tokens (native 32K, extends to 131,072 with YaRN)
Max Completion Tokens	40,960
Input Modalities	Text
Output Modalities	Text

Description

Qwen3-235B-A22B is a 235 billion parameter mixture-of-experts (MoE) model developed by Qwen (Alibaba Cloud), activating 22 billion parameters per forward pass. This architecture allows the model to deliver exceptional performance while maintaining computational efficiency.

The model supports seamless switching between two operational modes:

"Thinking" Mode: For complex reasoning, mathematics, and code tasks - uses explicit reasoning tokens (<think> and </think>)
"Non-thinking" Mode: For general conversational efficiency with faster response times

Key characteristics:

Strong Reasoning Ability: Excels at mathematical problem-solving and complex logical tasks
Multilingual Support: Supports 100+ languages and dialects
Advanced Instruction-Following: High accuracy in following complex instructions
Agent Tool-Calling: Native support for tool/function calling capabilities
Extended Context: Native 32K context window, extendable to 131K tokens using YaRN-based scaling

Pricing

Type	Price per Million
Input Tokens	$0.18
Output Tokens	$0.54
Request Fee	$0.00

Cost Comparison

This model offers competitive pricing for a 235B parameter MoE model
The MoE architecture (22B active parameters) enables lower per-token costs compared to dense models of similar capability

Supported Parameters

Parameter	Type	Description
`reasoning`	boolean	Enable reasoning mode with explicit thinking
`include_reasoning`	boolean	Include reasoning tokens in response
`max_tokens`	integer	Maximum number of tokens to generate
`temperature`	float	Sampling temperature (0-2)
`top_p`	float	Nucleus sampling probability
`top_k`	integer	Top-k sampling parameter
`stop`	array	Stop sequences
`frequency_penalty`	float	Frequency penalty for token repetition
`presence_penalty`	float	Presence penalty for new topics
`repetition_penalty`	float	Repetition penalty factor
`seed`	integer	Seed for reproducible outputs
`min_p`	float	Minimum probability threshold
`response_format`	object	Format specification for the response
`tools`	array	List of tools available to the model
`tool_choice`	string/object	Tool selection mode: `auto`, `none`, or specific tool

Default Configuration

Default Stop Tokens: <|im_start|>, <|im_end|>
Instruction Type: Qwen3

Features

Reasoning Tokens: Uses <think> and </think> tokens for explicit reasoning
Tool Calling: Native support for function/tool calling
Prompt Caching: Supported for repeated requests
Multipart Input: Supports multipart message format

Performance

Architecture Details

Specification	Value
Total Parameters	235 billion
Active Parameters	22 billion per forward pass
Architecture	Mixture-of-Experts (MoE)
Native Context	32,768 tokens
Extended Context	131,072 tokens (via YaRN scaling)

Recent Usage Statistics

Metric	Value
Peak Daily Requests	~150,000+ (December 16, 2025)
Reasoning Token Generation	20-40M tokens daily
Tool Call Error Rate	~0.05%

Capabilities

Reasoning: Strong performance on mathematical and logical reasoning benchmarks
Code Generation: Proficient in multiple programming languages
Multilingual: Supports 100+ languages and dialects
Long Context: Effective utilization of extended context windows

API Usage Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b",
    "messages": [
      {"role": "user", "content": "Explain the concept of mixture-of-experts architecture in neural networks."}
    ],
    "max_tokens": 2048,
    "temperature": 0.7
  }'

Reasoning Mode Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b",
    "messages": [
      {"role": "user", "content": "Solve this step by step: If a train travels 120 km in 2 hours, then stops for 30 minutes, then travels 90 km in 1.5 hours, what is the average speed for the entire journey?"}
    ],
    "reasoning": true,
    "include_reasoning": true,
    "max_tokens": 4096
  }'

Tool Calling Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Qwen3 Model Family

Model ID	Parameters	Active Params	Description
`qwen/qwen3-235b-a22b`	235B	22B	Flagship MoE model
`qwen/qwen3-32b`	32B	32B	Dense model variant
`qwen/qwen3-14b`	14B	14B	Mid-size dense model
`qwen/qwen3-7b`	7B	7B	Lightweight model
`qwen/qwen3-4b`	4B	4B	Compact model

Similar MoE Models

Model ID	Description
`deepseek/deepseek-v3`	DeepSeek V3 - MoE architecture
`mistralai/mixtral-8x22b`	Mixtral 8x22B - 8 expert MoE
`databricks/dbrx-instruct`	DBRX - MoE model

Providers

Primary Provider: DeepInfra

Property	Value
Provider	DeepInfra
Quantization	FP8
Context Length	40,960 tokens
Max Completion Tokens	40,960
Tool Support	Yes
Multipart Support	Yes
Reasoning Support	Yes

Notes

This model is part of Qwen's third-generation model family released in April 2025
The MoE architecture (235B total, 22B active) provides an excellent balance of capability and efficiency
Supports both "thinking" mode for complex tasks and "non-thinking" mode for quick responses
YaRN-based context extension allows handling documents up to 131K tokens
Currently available through DeepInfra with FP8 quantization
Native support for tool calling makes it suitable for agent-based applications

Source: LangMart Model Registry Last Updated: December 23, 2025

Qwen: Qwen3 235B A22B

Qwen: Qwen3 235B A22B

Overview

Description

Pricing

Cost Comparison

Supported Parameters

Default Configuration

Features

Performance

Architecture Details

Recent Usage Statistics

Capabilities

API Usage Example

Reasoning Mode Example

Tool Calling Example

Related Models

Qwen3 Model Family

Similar MoE Models

Providers

Primary Provider: DeepInfra

Notes