Model Class

DeepSeek-R1

Reasoning-focused open-source language model built on the DeepSeek-V3 base. Inherits the 671B total / 37B active MoE architecture from V3, but adds large-scale reinforcement-learning post-training specifically targeted at chain-of-thought reasoning. Notable design choice: trained R1-Zero via **pure RL without any supervised reasoning data** — the model developed self-verification, reflection, and long chain-of-thought reasoning purely through reward signals. Released January 2025 under MIT license. Available as the full 671B model or distilled into 1.5B, 7B, 8B, 14B, 32B, 70B variants.

3 connections

Definition

What it is

Why it exists

Reasoning was the gap between open-weight models and OpenAI's o1 series throughout 2024. The conventional wisdom was that o1-class reasoning required massive SFT corpora of curated chain-of-thought examples. DeepSeek-R1 disproved that: pure RL on the V3 base, with verifiable-reward tasks (math, code), produced o1-tier reasoning without any human-written CoT data. The release made frontier-class reasoning ability open-weight overnight, kicked off a 2026 wave of derivative reasoning models (QwQ, GLM-Zero, Kimi-Thinking), and established RL-first reasoning as the canonical training recipe.

Primary use cases

Math/code/logic-heavy agentic workloads where reasoning depth matters more than fluency, on-prem reasoning agents in regulated industries (MIT license), local laptop reasoning via the distilled 14B variant (the 1.5B variant runs on a laptop), distillation-source for further specialized reasoning models, and any deployment where the cost-per-reasoning-token of hosted o1/o3 is prohibitive.

Recent developments

Latest signals

Distilled variants: 1.5B, 7B, 8B, 14B, 32B, 70B — laptop-runnable reasoning. TokenMix demonstrated running the 1.5B variant on consumer hardware for genuinely useful reasoning workloads. Per TokenMix — R1 1.5B laptop review.
Distilled 14B outperforms much larger open-source models on reasoning benchmarks. The 14B distillation captures most of the 671B's reasoning ability at ~5% the parameter count — making it the default deployment target for cost-sensitive reasoning workloads. Per Fireworks — DeepSeek-R1 deep dive.
Pure-RL training architecture validated independently. The R1-Zero training-without-supervised-CoT-data approach has been replicated by several independent teams; the architecture is now the canonical reasoning recipe. Per SitePoint — DeepSeek-R1 open-source reasoning.
Benchmark performance: AIME 2024 79.8%, MATH-500 97.3%, Codeforces Elo 2,029. Per the original release, R1 sits in the same tier as OpenAI's o1 series on math and reasoning benchmarks. Per Chat Deep — R1 guide.
Academic verification of relational-reasoning ability. A May 2026 arXiv paper benchmarks LLMs on deep relational reasoning; DeepSeek-R1 achieves the highest F1-scores across multiple task sizes. Per arXiv 2506.23128.

Connections 3

Outbound 2

scoped_to1

AI Memory Infrastructure

depends_on1

DeepSeek V3

Inbound 1

enables1

DeepSeek V3