DeepSeek-R1
Reasoning-focused open-source language model built on the DeepSeek-V3 base. Inherits the 671B total / 37B active MoE architecture from V3, but adds large-scale reinforcement-learning post-training specifically targeted at chain-of-thought reasoning. Notable design choice: trained R1-Zero via **pure RL without any supervised reasoning data** — the model developed self-verification, reflection, and long chain-of-thought reasoning purely through reward signals. Released January 2025 under MIT license. Available as the full 671B model or distilled into 1.5B, 7B, 8B, 14B, 32B, 70B variants.
Definition
Reasoning-focused open-source language model built on the DeepSeek-V3 base. Inherits the 671B total / 37B active MoE architecture from V3, but adds large-scale reinforcement-learning post-training specifically targeted at chain-of-thought reasoning. Notable design choice: trained R1-Zero via **pure RL without any supervised reasoning data** — the model developed self-verification, reflection, and long chain-of-thought reasoning purely through reward signals. Released January 2025 under MIT license. Available as the full 671B model or distilled into 1.5B, 7B, 8B, 14B, 32B, 70B variants.
Reasoning was the gap between open-weight models and OpenAI's o1 series throughout 2024. The conventional wisdom was that o1-class reasoning required massive SFT corpora of curated chain-of-thought examples. DeepSeek-R1 disproved that: pure RL on the V3 base, with verifiable-reward tasks (math, code), produced o1-tier reasoning without any human-written CoT data. The release made frontier-class reasoning ability open-weight overnight, kicked off a 2026 wave of derivative reasoning models (QwQ, GLM-Zero, Kimi-Thinking), and established RL-first reasoning as the canonical training recipe.
Math/code/logic-heavy agentic workloads where reasoning depth matters more than fluency, on-prem reasoning agents in regulated industries (MIT license), local laptop reasoning via the distilled 14B variant (the 1.5B variant runs on a laptop), distillation-source for further specialized reasoning models, and any deployment where the cost-per-reasoning-token of hosted o1/o3 is prohibitive.
Recent developments
- Distilled variants: 1.5B, 7B, 8B, 14B, 32B, 70B — laptop-runnable reasoning. TokenMix demonstrated running the 1.5B variant on consumer hardware for genuinely useful reasoning workloads. Per TokenMix — R1 1.5B laptop review.
- Distilled 14B outperforms much larger open-source models on reasoning benchmarks. The 14B distillation captures most of the 671B's reasoning ability at ~5% the parameter count — making it the default deployment target for cost-sensitive reasoning workloads. Per Fireworks — DeepSeek-R1 deep dive.
- Pure-RL training architecture validated independently. The R1-Zero training-without-supervised-CoT-data approach has been replicated by several independent teams; the architecture is now the canonical reasoning recipe. Per SitePoint — DeepSeek-R1 open-source reasoning.
- Benchmark performance: AIME 2024 79.8%, MATH-500 97.3%, Codeforces Elo 2,029. Per the original release, R1 sits in the same tier as OpenAI's o1 series on math and reasoning benchmarks. Per Chat Deep — R1 guide.
- Academic verification of relational-reasoning ability. A May 2026 arXiv paper benchmarks LLMs on deep relational reasoning; DeepSeek-R1 achieves the highest F1-scores across multiple task sizes. Per arXiv 2506.23128.
Connections 3
Outbound 2
scoped_to1depends_on1Inbound 1
enables1