Paper-DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Table of Contents

Abstract

This paper introduces DeepSeek’s first-generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, trained through large-scale reinforcement learning (RL) without supervised fine-tuning (SFT), demonstrates remarkable reasoning capabilities. Through RL, it naturally develops powerful reasoning behaviors. However, it faces challenges like poor readability and language mixing. To address these issues and enhance reasoning performance, DeepSeek-R1 was developed, incorporating multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support research, DeepSeek open-sources both models and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

Key Contributions

Post-Training: Large-Scale Reinforcement Learning

Successfully applied RL directly to the base model without SFT
Developed DeepSeek-R1-Zero, demonstrating capabilities like self-verification and reflection
First open research validating that reasoning capabilities can be incentivized purely through RL
Introduced pipeline for DeepSeek-R1 with two RL stages and two SFT stages

Distillation: Empowering Smaller Models

Demonstrated that reasoning patterns from larger models can be effectively distilled into smaller ones
Open-sourced DeepSeek-R1 and its API to benefit research community
Fine-tuned several dense models showing exceptional benchmark performance
Distilled models significantly outperform previous open-source models

Evaluation Results

Reasoning Tasks

DeepSeek-R1 achieves 79.8% Pass@1 on AIME 2024, surpassing OpenAI-o1-1217
97.3% score on MATH-500, performing on par with OpenAI-o1-1217
Expert-level performance in code competition tasks with 2,029 Elo rating on Codeforces

Knowledge Tasks

Outstanding results on MMLU (90.8%), MMLU-Pro (84.0%), and GPQA Diamond (71.5%)
Surpasses other closed-source models in educational tasks
Strong performance on factual benchmarks like SimpleQA

General Capabilities

Excels in creative writing, question answering, editing, and summarization
87.6% win-rate on AlpacaEval 2.0 and 92.3% on ArenaHard
Strong performance in long-context understanding tasks

Future Work

The team plans to focus on:

Enhancing general capabilities in areas like function calling and complex role-playing
Addressing language mixing issues
Improving prompting engineering
Enhancing performance on software engineering tasks

Conclusion

DeepSeek-R1 represents a significant advancement in AI reasoning capabilities through reinforcement learning. The success of both the main model and its distilled versions demonstrates the potential of this approach for developing more capable AI systems. The open-source release of these models will contribute to further research and development in the field.

DeepSeek_R1 Download

Uncategorized

The world’s mainstream AI products focus on analysis and comprehensive user experience guidelines (including DeepSeek and GPT)

Byzddeepseeker February 10, 2025February 10, 2025

Function positioning and core advantage analysis ChatGPT (OpenAI) – the global benchmark for all-rounders ChatGPT Technical genes: generative AI based on the GPT series of large models, with general conversational skills and logical reasoning as its core advantages. Multilingual processing: performs best in English, with continuous improvement in Chinese;but we recommen to use English to…

Uncategorized

Altman: We were wrong about open source AI! DeepSeek has made OpenAI less advantageous, and the next one is GPT-5

Byzddeepseeker February 1, 2025February 1, 2025

o3-mini arrived late at night, and OpenAI finally revealed its latest trump card. During a Reddit AMA Q&A, Altman deeply confessed that he had stood on the wrong side of the open source AI. He said that the internal strategy of open source is being considered, and the model will continue to be developed, but…

Uncategorized

The Showdown of the Top Four Models! A Review Showcases How Powerful Deepseek R1 Is

Byzddeepseeker June 1, 2025June 1, 2025

Over the past few days, Deepseek-R1 0528 has been officially open-sourced. On LiveCodeBench, its performance is nearly on par with OpenAI’s o3 (high); in Aider’s multi-language benchmark test, it holds its own against Claude Opus. When it was launched on the official website, we quickly tested its front-end capabilities and found them to be exceptionally…

Uncategorized

DeepSeek has released its source code, detailed explanation of FlashMLA

Byzddeepseeker February 24, 2025February 24, 2025

Last week, DeepSeek announced that it would open source five projects next week: Netizens said, “This time, OpenAI is really here.” Just now, the first open source project came, related to inference acceleration, FlashMLA: Open source project address: DeepSeek FlashMLA It has been open source for two hours, and Github already has 2.7k+ stars: The…

Uncategorized

A comprehensive comparison of OpenAI’s newly released o3-mini and DeepSeek R1

Byzddeepseeker February 1, 2025February 1, 2025

OpenAI has released its latest inference model, o3-mini, which is optimized for fields such as science, mathematics, and programming, providing faster response, higher accuracy, and lower cost. Compared with its predecessor o1-mini, o3-mini has significantly improved its inference capabilities, especially in solving complex problems. Testers prefer o3-mini’s answers by 56%, and the error rate has…

Uncategorized

Gemini 2.0 dominates the charts, while DeepSeek V3 cries in its price, and a new cost-effective champion is born!

Byzddeepseeker February 8, 2025February 8, 2025

The Google Gemini 2.0 family is finally complete! It dominates the charts as soon as it is released. Amidst the pursuit and blockades of Deepseek, Qwen and o3, Google released three models in one go early this morning: Gemini 2.0 Pro, Gemini 2.0 Flash and Gemini 2.0 Flash-Lite. On the large model LMSYS rankings, Gemini…

Abstract

Key Contributions

Post-Training: Large-Scale Reinforcement Learning

Distillation: Empowering Smaller Models

Evaluation Results

Reasoning Tasks

Knowledge Tasks

General Capabilities

Future Work

Conclusion

Similar Posts

Leave a Reply Cancel reply