Q

Qwen: Qwen3 235B A22B

Qwen
32K
Context
$0.1800
Input /1M
$0.5400
Output /1M
41K
Max Output

Qwen: Qwen3 235B A22B

Inference Model ID: qwen/qwen3-235b-a22b

Overview

Property Value
Provider Qwen (Alibaba Cloud)
Model ID qwen/qwen3-235b-a22b
Permaslug qwen/qwen3-235b-a22b-04-28
Created April 28, 2025
Context Length 40,960 tokens (native 32K, extends to 131,072 with YaRN)
Max Completion Tokens 40,960
Input Modalities Text
Output Modalities Text

Description

Qwen3-235B-A22B is a 235 billion parameter mixture-of-experts (MoE) model developed by Qwen (Alibaba Cloud), activating 22 billion parameters per forward pass. This architecture allows the model to deliver exceptional performance while maintaining computational efficiency.

The model supports seamless switching between two operational modes:

  • "Thinking" Mode: For complex reasoning, mathematics, and code tasks - uses explicit reasoning tokens (<think> and </think>)
  • "Non-thinking" Mode: For general conversational efficiency with faster response times

Key characteristics:

  • Strong Reasoning Ability: Excels at mathematical problem-solving and complex logical tasks
  • Multilingual Support: Supports 100+ languages and dialects
  • Advanced Instruction-Following: High accuracy in following complex instructions
  • Agent Tool-Calling: Native support for tool/function calling capabilities
  • Extended Context: Native 32K context window, extendable to 131K tokens using YaRN-based scaling

Pricing

Type Price per Million
Input Tokens $0.18
Output Tokens $0.54
Request Fee $0.00

Cost Comparison

  • This model offers competitive pricing for a 235B parameter MoE model
  • The MoE architecture (22B active parameters) enables lower per-token costs compared to dense models of similar capability

Supported Parameters

Parameter Type Description
reasoning boolean Enable reasoning mode with explicit thinking
include_reasoning boolean Include reasoning tokens in response
max_tokens integer Maximum number of tokens to generate
temperature float Sampling temperature (0-2)
top_p float Nucleus sampling probability
top_k integer Top-k sampling parameter
stop array Stop sequences
frequency_penalty float Frequency penalty for token repetition
presence_penalty float Presence penalty for new topics
repetition_penalty float Repetition penalty factor
seed integer Seed for reproducible outputs
min_p float Minimum probability threshold
response_format object Format specification for the response
tools array List of tools available to the model
tool_choice string/object Tool selection mode: auto, none, or specific tool

Default Configuration

  • Default Stop Tokens: <|im_start|>, <|im_end|>
  • Instruction Type: Qwen3

Features

  • Reasoning Tokens: Uses <think> and </think> tokens for explicit reasoning
  • Tool Calling: Native support for function/tool calling
  • Prompt Caching: Supported for repeated requests
  • Multipart Input: Supports multipart message format

Performance

Architecture Details

Specification Value
Total Parameters 235 billion
Active Parameters 22 billion per forward pass
Architecture Mixture-of-Experts (MoE)
Native Context 32,768 tokens
Extended Context 131,072 tokens (via YaRN scaling)

Recent Usage Statistics

Metric Value
Peak Daily Requests ~150,000+ (December 16, 2025)
Reasoning Token Generation 20-40M tokens daily
Tool Call Error Rate ~0.05%

Capabilities

  • Reasoning: Strong performance on mathematical and logical reasoning benchmarks
  • Code Generation: Proficient in multiple programming languages
  • Multilingual: Supports 100+ languages and dialects
  • Long Context: Effective utilization of extended context windows

API Usage Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b",
    "messages": [
      {"role": "user", "content": "Explain the concept of mixture-of-experts architecture in neural networks."}
    ],
    "max_tokens": 2048,
    "temperature": 0.7
  }'

Reasoning Mode Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b",
    "messages": [
      {"role": "user", "content": "Solve this step by step: If a train travels 120 km in 2 hours, then stops for 30 minutes, then travels 90 km in 1.5 hours, what is the average speed for the entire journey?"}
    ],
    "reasoning": true,
    "include_reasoning": true,
    "max_tokens": 4096
  }'

Tool Calling Example

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-235b-a22b",
    "messages": [
      {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Qwen3 Model Family

Model ID Parameters Active Params Description
qwen/qwen3-235b-a22b 235B 22B Flagship MoE model
qwen/qwen3-32b 32B 32B Dense model variant
qwen/qwen3-14b 14B 14B Mid-size dense model
qwen/qwen3-7b 7B 7B Lightweight model
qwen/qwen3-4b 4B 4B Compact model

Similar MoE Models

Model ID Description
deepseek/deepseek-v3 DeepSeek V3 - MoE architecture
mistralai/mixtral-8x22b Mixtral 8x22B - 8 expert MoE
databricks/dbrx-instruct DBRX - MoE model

Providers

Primary Provider: DeepInfra

Property Value
Provider DeepInfra
Quantization FP8
Context Length 40,960 tokens
Max Completion Tokens 40,960
Tool Support Yes
Multipart Support Yes
Reasoning Support Yes

Notes

  • This model is part of Qwen's third-generation model family released in April 2025
  • The MoE architecture (235B total, 22B active) provides an excellent balance of capability and efficiency
  • Supports both "thinking" mode for complex tasks and "non-thinking" mode for quick responses
  • YaRN-based context extension allows handling documents up to 131K tokens
  • Currently available through DeepInfra with FP8 quantization
  • Native support for tool calling makes it suitable for agent-based applications

Source: LangMart Model Registry Last Updated: December 23, 2025