O

LangMart: Qwen: Qwen3 30B A3B

Openrouter
41K
Context
$0.0600
Input /1M
$0.2200
Output /1M
N/A
Max Output

LangMart: Qwen: Qwen3 30B A3B

Model Overview

Property Value
Model ID openrouter/qwen/qwen3-30b-a3b
Name Qwen: Qwen3 30B A3B
Provider qwen
Released 2025-04-28

Description

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance.

Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

Description

LangMart: Qwen: Qwen3 30B A3B is a language model provided by qwen. This model offers advanced capabilities for natural language processing tasks.

Provider

qwen

Specifications

Spec Value
Context Window 40,960 tokens
Modalities text->text
Input Modalities text
Output Modalities text

Pricing

Type Price
Input $0.06 per 1M tokens
Output $0.22 per 1M tokens

Capabilities

  • Frequency penalty
  • Include reasoning
  • Max tokens
  • Min p
  • Presence penalty
  • Reasoning
  • Repetition penalty
  • Response format
  • Seed
  • Stop
  • Structured outputs
  • Temperature
  • Tool choice
  • Tools
  • Top k
  • Top p

Detailed Analysis

Qwen3-30B-A3B is a Mixture-of-Experts (MoE) language model offering large-scale capabilities with efficient sparse activation. Released April 2025. Key characteristics: (1) Architecture: 30B total parameters with 3B activated per forward pass (A3B = Activated 3B), achieving ~90% compute reduction compared to hypothetical dense 30B model while maintaining comparable quality; uses Qwen 3 MoE design excluding shared experts, with global-batch load balancing loss encouraging expert specialization; (2) Performance: Performs comparably to dense 30B models through efficient expert activation; trained on 36T tokens providing strong general-purpose capabilities across reasoning, coding, and language understanding; (3) Use Cases: Cost-sensitive production deployments, high-throughput applications requiring large model capabilities, cloud deployments optimizing compute costs, general-purpose AI at scale, multi-tenant systems serving many users; (4) Context Window: 131K tokens supporting comprehensive document processing; (5) Pricing: Reflects activated parameters (3B) rather than total (30B), making it cost-effective for the capability level; (6) Trade-offs: MoE architecture may introduce latency variability vs dense models; cutting-edge approach demonstrating efficiency of sparse activation. Best for production applications requiring large model capabilities while optimizing inference costs through sparse computation.