DeepSeek R1
Overview
Description
DeepSeek R1 is DeepSeek AI's first-generation reasoning model that achieves performance on par with OpenAI o1 across math, code, and reasoning tasks. It is fully open-source with MIT licensing, featuring fully open reasoning tokens that allow developers to access and utilize the model's chain-of-thought reasoning process.
The model represents a key innovation in AI research: it is the first open research validating that LLM reasoning capabilities can be developed purely through reinforcement learning (RL) without supervised fine-tuning (SFT) as a prerequisite. The model explores chain-of-thought for complex problem solving and emerges with self-verification, reflection, and long CoT capabilities.
Key Features
- Performance comparable to OpenAI o1 on math, code, and reasoning benchmarks
- Fully open-source with MIT license (supports commercial use, modifications, and distillation)
- Open reasoning tokens with visible chain-of-thought process
- Trained via large-scale reinforcement learning with a hybrid two-stage pipeline
- Available in multiple distilled variants from 1.5B to 70B parameters
Pricing
LangMart Pricing (via Chutes Provider - fp8 quantization)
| Type |
Price per 1M Tokens |
| Input |
$0.30 |
| Output |
$1.20 |
DeepSeek API Direct Pricing
| Type |
Price per 1M Tokens |
| Input |
$0.55 |
| Output |
$2.19 |
Cost Comparison with OpenAI o1
| Model |
Input (per 1M) |
Output (per 1M) |
| DeepSeek R1 (API) |
$0.55 |
$2.19 |
| OpenAI o1 |
$15.00 |
$60.00 |
| Cost Savings |
~27x cheaper |
~27x cheaper |
Supported Parameters
| Parameter |
Supported |
Description |
reasoning |
Yes |
Enable reasoning mode |
include_reasoning |
Yes |
Include reasoning tokens in response |
structured_outputs |
Yes |
Enable structured output format |
response_format |
Yes |
Specify response format (JSON, etc.) |
max_tokens |
Yes |
Maximum tokens to generate |
temperature |
Yes |
Sampling temperature (recommended: 0.5-0.7, optimal: 0.6) |
top_p |
Yes |
Nucleus sampling probability |
top_k |
Yes |
Top-k sampling |
stop |
Yes |
Stop sequences |
frequency_penalty |
Yes |
Frequency penalty for token repetition |
presence_penalty |
Yes |
Presence penalty for topic diversity |
repetition_penalty |
Yes |
Repetition penalty |
seed |
Yes |
Random seed for reproducibility |
- Basic function calling capability supported
- Structured outputs capability available
DeepSeek Family
| Model ID |
Description |
deepseek/deepseek-chat |
DeepSeek Chat model |
deepseek/deepseek-v3.1 |
DeepSeek V3.1 hybrid reasoning model |
deepseek/deepseek-v3.2 |
DeepSeek V3.2 with improved efficiency |
deepseek/deepseek-v3.2-speciale |
High-compute reasoning variant |
Competitor Reasoning Models
| Model |
Provider |
Notes |
| OpenAI o1 |
OpenAI |
Proprietary reasoning model |
| OpenAI o3 |
OpenAI |
Latest reasoning model (o3-high variant) |
| Claude 3.5 Sonnet |
Anthropic |
Strong reasoning capabilities |
Providers
Chutes (Primary Provider on LangMart)
| Property |
Value |
| Quantization |
fp8 |
| Context Length |
163,840 tokens |
| Headquarters |
United States |
| Base URL |
https://llm.chutes.ai/v1 |
| Multi-part Support |
Yes |
| BYOK Enabled |
Yes |
| Abort Capability |
Supported |
Data Policy
Other Available Variants on LangMart
| Model ID |
Description |
deepseek/deepseek-r1:free |
Free tier version |
deepseek/deepseek-r1-0528 |
May 2025 upgraded release |
deepseek/deepseek-r1-0528:free |
Free tier of May 2025 version |
deepseek/deepseek-r1-0528-qwen3-8b |
Distilled 8B hybrid model |
deepseek/deepseek-r1-distill-llama-70b |
Llama 70B distilled variant |
deepseek/deepseek-r1-distill-qwen-32b |
Qwen 32B distilled variant |
Architecture
| Specification |
Details |
| Total Parameters |
671 billion |
| Active Parameters |
37 billion (per inference pass) |
| Architecture Type |
Mixture of Experts (MoE) |
| Base Model |
DeepSeek-V3-Base |
| Context Length |
163,840 tokens / 128K tokens (official) |
| Input Modalities |
Text |
| Output Modalities |
Text |
The model supports structured reasoning with configurable tokens:
- Start Token:
<think>
- End Token:
</think>
Default Stop Sequences
<|User|>
<|end_of_sentence|>
Mathematics
| Benchmark |
DeepSeek R1 |
OpenAI o1 |
o1-mini |
GPT-4o |
| AIME 2024 |
79.8% |
79.2% |
63.6% |
9.3% |
| MATH-500 |
97.3% |
96.4% |
90.0% |
74.6% |
Coding
| Benchmark |
DeepSeek R1 |
OpenAI o1 |
o1-mini |
GPT-4o |
| Codeforces Rating |
2,029 |
2,061 |
1,820 |
759 |
| Codeforces Percentile |
96.3% |
96.6% |
93.4% |
23.6% |
| LiveCodeBench |
65.9% |
63.4% |
53.8% |
34.2% |
General Reasoning
| Benchmark |
DeepSeek R1 |
OpenAI o1 |
Claude 3.5 |
GPT-4o |
| MMLU |
90.8% |
91.8% |
88.3% |
87.2% |
| MMLU-Pro |
84.0% |
- |
78.0% |
72.6% |
| DROP (F1) |
92.2% |
90.2% |
88.3% |
83.7% |
| GPQA Diamond |
71.5% |
76.0% |
- |
- |
R1-0528 Updated Benchmarks (May 2025)
| Benchmark |
R1 Original |
R1-0528 |
Improvement |
| AIME 2024 |
79.8% |
91.4% |
+11.6% |
| AIME 2025 |
70.0% |
87.5% |
+17.5% |
| Codeforces |
~1,530 |
~1,930 |
+400 Elo |
Distilled Models
DeepSeek provides six distilled variants trained on 800K curated samples from DeepSeek-R1:
| Model |
Base |
Parameters |
AIME 2024 |
MATH-500 |
| DeepSeek-R1-Distill-Qwen-1.5B |
Qwen2.5-Math-1.5B |
1.5B |
- |
- |
| DeepSeek-R1-Distill-Qwen-7B |
Qwen2.5-Math-7B |
7B |
- |
- |
| DeepSeek-R1-Distill-Llama-8B |
Llama-3.1-8B |
8B |
- |
- |
| DeepSeek-R1-Distill-Qwen-14B |
Qwen2.5-14B |
14B |
- |
- |
| DeepSeek-R1-Distill-Qwen-32B |
Qwen2.5-32B |
32B |
72.6% |
94.3% |
| DeepSeek-R1-Distill-Llama-70B |
Llama-3.3-70B-Instruct |
70B |
70.0% |
94.5% |
Note: DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI o1-mini across most benchmarks.
Usage Recommendations
Critical Configuration Settings
Temperature: Use 0.5-0.7 (0.6 recommended)
- Prevents endless repetition and incoherent outputs
System Prompt: AVOID using system prompts
- All instructions must be in the user prompt only
Math Problems: Include the directive:
Please reason step by step, and put your final answer within \boxed{}.
Enforce Reasoning: Force output to start with <think>\n
- The model may skip thinking patterns on certain queries
- Enforcing the think tag ensures thorough reasoning
Evaluation: Conduct multiple tests and average results for accurate assessment
Example API Call
curl https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer $LANGMART_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-r1",
"messages": [
{
"role": "user",
"content": "Solve: What is the sum of all integers from 1 to 100? Please reason step by step, and put your final answer within \\boxed{}."
}
],
"temperature": 0.6
}'
Training Methodology
Two-Stage Approach
Stage 1 - Large-Scale Reinforcement Learning:
- Direct RL application to base model without prior SFT
- Model explores chain-of-thought (CoT) for complex problem solving
- Emerges with self-verification, reflection, and long CoT capabilities
Stage 2 - Hybrid Pipeline:
- Two RL stages: Discover improved reasoning patterns and align with human preferences
- Two SFT stages: Seed reasoning and non-reasoning capabilities
- Uses 800K curated samples from DeepSeek-R1 for distillation
Key Achievement
DeepSeek-R1-Zero demonstrates that reasoning patterns discovered through RL can be effectively distilled into smaller models, achieving superior performance compared to RL-only training on small models.
Local Deployment
vLLM (for distilled models)
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--tensor-parallel-size 2 \
--max-model-len 32768 \
--enforce-eager
SGLang
python3 -m sglang.launch_server \
--model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--trust-remote-code \
--tp 2
Citation
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
author={DeepSeek-AI},
year={2025},
eprint={2501.12948},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.12948},
}
Sources