D

DeepSeek R1

DeepSeek
164K
Context
$0.3000
Input /1M
$1.20
Output /1M
N/A
Max Output

DeepSeek R1

Overview

Property Value
Model ID deepseek/deepseek-r1
Display Name DeepSeek: R1
Short Name R1
Creator DeepSeek AI
Created Date January 20, 2025
License MIT (fully open-source, allows distillation and commercial use)
HuggingFace deepseek-ai/DeepSeek-R1
GitHub deepseek-ai/DeepSeek-R1
Paper arXiv:2501.12948

Description

DeepSeek R1 is DeepSeek AI's first-generation reasoning model that achieves performance on par with OpenAI o1 across math, code, and reasoning tasks. It is fully open-source with MIT licensing, featuring fully open reasoning tokens that allow developers to access and utilize the model's chain-of-thought reasoning process.

The model represents a key innovation in AI research: it is the first open research validating that LLM reasoning capabilities can be developed purely through reinforcement learning (RL) without supervised fine-tuning (SFT) as a prerequisite. The model explores chain-of-thought for complex problem solving and emerges with self-verification, reflection, and long CoT capabilities.

Key Features

  • Performance comparable to OpenAI o1 on math, code, and reasoning benchmarks
  • Fully open-source with MIT license (supports commercial use, modifications, and distillation)
  • Open reasoning tokens with visible chain-of-thought process
  • Trained via large-scale reinforcement learning with a hybrid two-stage pipeline
  • Available in multiple distilled variants from 1.5B to 70B parameters

Pricing

LangMart Pricing (via Chutes Provider - fp8 quantization)

Type Price per 1M Tokens
Input $0.30
Output $1.20

DeepSeek API Direct Pricing

Type Price per 1M Tokens
Input $0.55
Output $2.19

Cost Comparison with OpenAI o1

Model Input (per 1M) Output (per 1M)
DeepSeek R1 (API) $0.55 $2.19
OpenAI o1 $15.00 $60.00
Cost Savings ~27x cheaper ~27x cheaper

Supported Parameters

Parameter Supported Description
reasoning Yes Enable reasoning mode
include_reasoning Yes Include reasoning tokens in response
structured_outputs Yes Enable structured output format
response_format Yes Specify response format (JSON, etc.)
max_tokens Yes Maximum tokens to generate
temperature Yes Sampling temperature (recommended: 0.5-0.7, optimal: 0.6)
top_p Yes Nucleus sampling probability
top_k Yes Top-k sampling
stop Yes Stop sequences
frequency_penalty Yes Frequency penalty for token repetition
presence_penalty Yes Presence penalty for topic diversity
repetition_penalty Yes Repetition penalty
seed Yes Random seed for reproducibility

Tool Support

  • Basic function calling capability supported
  • Structured outputs capability available

DeepSeek Family

Model ID Description
deepseek/deepseek-chat DeepSeek Chat model
deepseek/deepseek-v3.1 DeepSeek V3.1 hybrid reasoning model
deepseek/deepseek-v3.2 DeepSeek V3.2 with improved efficiency
deepseek/deepseek-v3.2-speciale High-compute reasoning variant

Competitor Reasoning Models

Model Provider Notes
OpenAI o1 OpenAI Proprietary reasoning model
OpenAI o3 OpenAI Latest reasoning model (o3-high variant)
Claude 3.5 Sonnet Anthropic Strong reasoning capabilities

Providers

Chutes (Primary Provider on LangMart)

Property Value
Quantization fp8
Context Length 163,840 tokens
Headquarters United States
Base URL https://llm.chutes.ai/v1
Multi-part Support Yes
BYOK Enabled Yes
Abort Capability Supported

Data Policy

Other Available Variants on LangMart

Model ID Description
deepseek/deepseek-r1:free Free tier version
deepseek/deepseek-r1-0528 May 2025 upgraded release
deepseek/deepseek-r1-0528:free Free tier of May 2025 version
deepseek/deepseek-r1-0528-qwen3-8b Distilled 8B hybrid model
deepseek/deepseek-r1-distill-llama-70b Llama 70B distilled variant
deepseek/deepseek-r1-distill-qwen-32b Qwen 32B distilled variant

Architecture

Specification Details
Total Parameters 671 billion
Active Parameters 37 billion (per inference pass)
Architecture Type Mixture of Experts (MoE)
Base Model DeepSeek-V3-Base
Context Length 163,840 tokens / 128K tokens (official)
Input Modalities Text
Output Modalities Text

Reasoning Token Format

The model supports structured reasoning with configurable tokens:

  • Start Token: <think>
  • End Token: </think>

Default Stop Sequences

  • <|User|>
  • <|end_of_sentence|>

Performance Benchmarks

Mathematics

Benchmark DeepSeek R1 OpenAI o1 o1-mini GPT-4o
AIME 2024 79.8% 79.2% 63.6% 9.3%
MATH-500 97.3% 96.4% 90.0% 74.6%

Coding

Benchmark DeepSeek R1 OpenAI o1 o1-mini GPT-4o
Codeforces Rating 2,029 2,061 1,820 759
Codeforces Percentile 96.3% 96.6% 93.4% 23.6%
LiveCodeBench 65.9% 63.4% 53.8% 34.2%

General Reasoning

Benchmark DeepSeek R1 OpenAI o1 Claude 3.5 GPT-4o
MMLU 90.8% 91.8% 88.3% 87.2%
MMLU-Pro 84.0% - 78.0% 72.6%
DROP (F1) 92.2% 90.2% 88.3% 83.7%
GPQA Diamond 71.5% 76.0% - -

R1-0528 Updated Benchmarks (May 2025)

Benchmark R1 Original R1-0528 Improvement
AIME 2024 79.8% 91.4% +11.6%
AIME 2025 70.0% 87.5% +17.5%
Codeforces ~1,530 ~1,930 +400 Elo

Distilled Models

DeepSeek provides six distilled variants trained on 800K curated samples from DeepSeek-R1:

Model Base Parameters AIME 2024 MATH-500
DeepSeek-R1-Distill-Qwen-1.5B Qwen2.5-Math-1.5B 1.5B - -
DeepSeek-R1-Distill-Qwen-7B Qwen2.5-Math-7B 7B - -
DeepSeek-R1-Distill-Llama-8B Llama-3.1-8B 8B - -
DeepSeek-R1-Distill-Qwen-14B Qwen2.5-14B 14B - -
DeepSeek-R1-Distill-Qwen-32B Qwen2.5-32B 32B 72.6% 94.3%
DeepSeek-R1-Distill-Llama-70B Llama-3.3-70B-Instruct 70B 70.0% 94.5%

Note: DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI o1-mini across most benchmarks.


Usage Recommendations

Critical Configuration Settings

  1. Temperature: Use 0.5-0.7 (0.6 recommended)

    • Prevents endless repetition and incoherent outputs
  2. System Prompt: AVOID using system prompts

    • All instructions must be in the user prompt only
  3. Math Problems: Include the directive:

    Please reason step by step, and put your final answer within \boxed{}.
    
  4. Enforce Reasoning: Force output to start with <think>\n

    • The model may skip thinking patterns on certain queries
    • Enforcing the think tag ensures thorough reasoning
  5. Evaluation: Conduct multiple tests and average results for accurate assessment

Example API Call

curl https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer $LANGMART_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-r1",
    "messages": [
      {
        "role": "user",
        "content": "Solve: What is the sum of all integers from 1 to 100? Please reason step by step, and put your final answer within \\boxed{}."
      }
    ],
    "temperature": 0.6
  }'

Training Methodology

Two-Stage Approach

Stage 1 - Large-Scale Reinforcement Learning:

  • Direct RL application to base model without prior SFT
  • Model explores chain-of-thought (CoT) for complex problem solving
  • Emerges with self-verification, reflection, and long CoT capabilities

Stage 2 - Hybrid Pipeline:

  • Two RL stages: Discover improved reasoning patterns and align with human preferences
  • Two SFT stages: Seed reasoning and non-reasoning capabilities
  • Uses 800K curated samples from DeepSeek-R1 for distillation

Key Achievement

DeepSeek-R1-Zero demonstrates that reasoning patterns discovered through RL can be effectively distilled into smaller models, achieving superior performance compared to RL-only training on small models.


Official Platforms


Local Deployment

vLLM (for distilled models)

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
  --tensor-parallel-size 2 \
  --max-model-len 32768 \
  --enforce-eager

SGLang

python3 -m sglang.launch_server \
  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
  --trust-remote-code \
  --tp 2

Citation

@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
      title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
      author={DeepSeek-AI},
      year={2025},
      eprint={2501.12948},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.12948},
}

Sources