O

LangMart: OpenAI: gpt-oss-20b

Openrouter
131K
Context
$0.0300
Input /1M
$0.1400
Output /1M
N/A
Max Output

LangMart: OpenAI: gpt-oss-20b

Model Overview

Property Value
Model ID openrouter/openai/gpt-oss-20b
Name OpenAI: gpt-oss-20b
Provider openai
Released 2025-08-05

Description

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

Description

LangMart: OpenAI: gpt-oss-20b is a language model provided by openai. This model offers advanced capabilities for natural language processing tasks.

Provider

openai

Specifications

Spec Value
Context Window 131,072 tokens
Modalities text->text
Input Modalities text
Output Modalities text

Pricing

Type Price
Input $0.03 per 1M tokens
Output $0.14 per 1M tokens

Capabilities

  • Frequency penalty
  • Include reasoning
  • Logit bias
  • Max tokens
  • Min p
  • Presence penalty
  • Reasoning
  • Reasoning effort
  • Repetition penalty
  • Response format
  • Seed
  • Stop
  • Structured outputs
  • Temperature
  • Tool choice
  • Tools
  • Top k
  • Top p

Detailed Analysis

GPT-OSS-20b is OpenAI's compact open-weight model under Apache 2.0 license with 21B total parameters using MoE architecture. Activates only 3.6B parameters per token, requiring just 16GB memory for efficient deployment. Features 32 experts (4 active), 128K native context with RoPE, and grouped multi-query attention (group size 8). Delivers similar performance to o3-mini on common benchmarks despite small active parameter count. Natively quantized in MXFP4. Priced at $0.03/$0.14 per 1M tokens. Best for: edge deployments and resource-constrained environments, mobile and embedded AI applications, local development and testing, cost-sensitive production deployments, applications requiring fast inference with minimal memory.