Paper-DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Abstract This paper introduces DeepSeek’s first-generation reasoning models: DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, trained through large-scale reinforcement learning (RL) without supervised fine-tuning (SFT), demonstrates remarkable reasoning capabilities. Through RL, it naturally develops powerful reasoning behaviors. However, it faces challenges like poor readability and language mixing. To address these issues and enhance reasoning performance, DeepSeek-R1 was developed,…