DeepSeek V4 | LLMS3

Summary

What it is

DeepSeek's flagship V3 successor, served as V4-Pro (1M context) and V4-Flash, whose 75% price cut became the permanent list price on May 22, 2026. Per [DeepSeek V4-Pro 75% Price Cut Goes Permanent](https://codersera.com/blog/deepseek-v4-pro-permanent-price-cut-may-2026/).

Where it fits

It is a model-class node on the cost-of-inference axis of the site. Its permanent price floor ($0.435/$0.87 per MTok for Pro) makes token-heavy patterns — agentic retrieval, validators, large-batch data processing on S3-backed corpora — economically viable, and it sets a reference price competitors must answer.

Misconceptions / Traps

The flagship API name is V4-Pro, not bare "V4"; V4-Flash is a separate, cheaper tier — don't conflate their prices.
"Permanent" is the publisher's framing for a non-expiring list price, not a contractual guarantee; verify live rates before quoting.
The cache-hit price ($0.003625/M Pro, $0.0028/M Flash) only applies to cached input, not all input — blended cost depends on cache-hit ratio.
Competitor figures (GPT-5.5, Opus 4.7) come from third-party analysis, not DeepSeek's docs; treat multipliers as reported, not official.

Key Connections

extends DeepSeek V3 — V4 is the direct successor flagship line.
solves High Cloud Inference Cost — its permanent floor is the clearest 2026 datapoint on collapsing inference cost.
optimizes_for Performance-per-Dollar — positioned explicitly on price/capability against US frontier models.
competes_with GPT-5.5 / Claude Opus — third-party pricing puts it ~11-34x cheaper per token.

Definition

What it is

DeepSeek V4 is DeepSeek's flagship model family and the successor to V3, served via the first-party API in two tiers: V4-Pro (1M context, thinking and non-thinking modes) and the cheaper V4-Flash. On May 22, 2026 DeepSeek made its 75% V4-Pro discount permanent — converting a promotion that was set to expire May 31, 2026 into the standing list price. Per [DeepSeek V4-Pro 75% Price Cut Goes Permanent](https://codersera.com/blog/deepseek-v4-pro-permanent-price-cut-may-2026/).

Why it exists

Cheap, capable inference is the economic precondition for agent-heavy data infrastructure — agentic retrieval (DCI), validators, and LLM-assisted catalog/data systems all burn tokens. By publishing a non-expiring price floor, V4 directly drives down the cost-of-inference for self-hosted-adjacent AI-data pipelines. Per [DeepSeek's steep V4-Pro price cut escalates AI pricing war (InfoWorld)](https://www.infoworld.com/article/4176709/deepseeks-steep-v4-pro-price-cut-escalates-ai-pricing-war.html).

Primary use cases

Cost-sensitive high-volume inference, agentic retrieval and search loops, batch document/data processing, long-context (1M) reasoning, LLM-assisted data and catalog tooling.

Recent developments

Latest signals

V4 preview ships open weights at 1.6T parameters — within 0.2 points of the closed US frontier. The V4 preview (April 24, 2026) released V4-Pro at 1.6 trillion total parameters (49B active) plus a 284B V4-Flash, both as open weights — making V4-Pro the largest open-weights model available to download and self-host. It scores 80.6% on SWE-bench Verified (within 0.2 pts of Claude Opus 4.6), 93.5% on LiveCodeBench, a Codeforces rating of 3206, and a 1M-token context. DeepSeek frames it as trailing state-of-the-art closed models by only 3–6 months — the capability half of the pricing story below: frontier-adjacent quality you can run behind your own data-residency boundary. Per the DeepSeek V4-Pro complete guide.
75% V4-Pro price cut made permanent on May 22, 2026. What was a promo expiring May 31 became the standing rate: $0.435/M input (cache miss), $0.87/M output, $0.003625/M cached input. Per Models & Pricing | DeepSeek API Docs.
V4-Flash tier published at a still-lower point. $0.14/M input (cache miss), $0.28/M output, $0.0028/M cached input. Per Models & Pricing | DeepSeek API Docs.
Roughly an order of magnitude under US frontier APIs. V4-Pro is reported ~11.5x cheaper than GPT-5.5 on input and ~34.5x cheaper on output, and ~28.7x cheaper per output token than Claude Opus 4.7 ($0.435/$0.87 vs GPT-5.5 $5/$30 and Opus 4.7 $5/$25). Per DeepSeek V4-Pro 75% Price Cut Goes Permanent.
Framed as an efficiency pass-through, not a loss-leader. Analysts characterize it as "an efficiency gain being passed through," which is why it is permanent rather than promotional, pressuring OpenAI, Anthropic, and Google. Per DeepSeek's steep V4-Pro price cut escalates AI pricing war (InfoWorld).