LangMart: OpenAI: gpt-oss-20b
Model Overview
| Property | Value |
|---|---|
| Model ID | openrouter/openai/gpt-oss-20b |
| Name | OpenAI: gpt-oss-20b |
| Provider | openai |
| Released | 2025-08-05 |
Description
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.
Description
LangMart: OpenAI: gpt-oss-20b is a language model provided by openai. This model offers advanced capabilities for natural language processing tasks.
Provider
openai
Specifications
| Spec | Value |
|---|---|
| Context Window | 131,072 tokens |
| Modalities | text->text |
| Input Modalities | text |
| Output Modalities | text |
Pricing
| Type | Price |
|---|---|
| Input | $0.03 per 1M tokens |
| Output | $0.14 per 1M tokens |
Capabilities
- Frequency penalty
- Include reasoning
- Logit bias
- Max tokens
- Min p
- Presence penalty
- Reasoning
- Reasoning effort
- Repetition penalty
- Response format
- Seed
- Stop
- Structured outputs
- Temperature
- Tool choice
- Tools
- Top k
- Top p
Detailed Analysis
GPT-OSS-20b is OpenAI's compact open-weight model under Apache 2.0 license with 21B total parameters using MoE architecture. Activates only 3.6B parameters per token, requiring just 16GB memory for efficient deployment. Features 32 experts (4 active), 128K native context with RoPE, and grouped multi-query attention (group size 8). Delivers similar performance to o3-mini on common benchmarks despite small active parameter count. Natively quantized in MXFP4. Priced at $0.03/$0.14 per 1M tokens. Best for: edge deployments and resource-constrained environments, mobile and embedded AI applications, local development and testing, cost-sensitive production deployments, applications requiring fast inference with minimal memory.