O

LangMart: OpenAI: gpt-oss-120b

Openrouter
131K
Context
$0.0400
Input /1M
$0.1900
Output /1M
N/A
Max Output

LangMart: OpenAI: gpt-oss-120b

Model Overview

Property Value
Model ID openrouter/openai/gpt-oss-120b
Name OpenAI: gpt-oss-120b
Provider openai
Released 2025-08-05

Description

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

Description

LangMart: OpenAI: gpt-oss-120b is a language model provided by openai. This model offers advanced capabilities for natural language processing tasks.

Provider

openai

Specifications

Spec Value
Context Window 131,072 tokens
Modalities text->text
Input Modalities text
Output Modalities text

Pricing

Type Price
Input $0.04 per 1M tokens
Output $0.19 per 1M tokens

Capabilities

  • Frequency penalty
  • Include reasoning
  • Logit bias
  • Logprobs
  • Max tokens
  • Min p
  • Presence penalty
  • Reasoning
  • Reasoning effort
  • Repetition penalty
  • Response format
  • Seed
  • Stop
  • Structured outputs
  • Temperature
  • Tool choice
  • Tools
  • Top k
  • Top logprobs
  • Top p

Detailed Analysis

GPT-OSS-120b is OpenAI's large open-weight model released under Apache 2.0 license with 117B total parameters using mixture-of-experts (MoE) architecture. Activates only 5.1B parameters per token, running efficiently within 80GB memory on a single GPU. Features 128 expert sub-networks (4 active per token), 128K native context length with RoPE positional encoding, and grouped multi-query attention. Achieves near-parity with o4-mini on reasoning benchmarks. Natively quantized in MXFP4. Priced at $0.04/$0.40 per 1M tokens. Best for: self-hosted deployments requiring strong performance, on-premise applications with data privacy requirements, cost-sensitive large-scale deployments, research requiring model inspection and modification.