Definition

What it is

A new storage tier — also referred to as **Context Memory eXtension (CMX)** — sitting between traditional NVMe SSDs and cold S3 buckets, specifically optimized for AI inference state. Leverages high-performance DPUs (NVIDIA BlueField-4) and DPU-attached flash to offload data placement and context retrieval at the pod level. Solidigm and other flash vendors are productizing CMX as a distinct SKU class, separate from general-purpose enterprise SSDs, with media tuned for the bursty, mixed read/write access patterns of agentic state and KV-cache offloading.

Why it exists

AI inference workloads have a specific shape that traditional storage hardware doesn't serve well — long sessions interleaved with short bursts, KV-cache state that's written once and read many times, agent memory that persists for hours-to-days but doesn't need cold-tier economics. CMX-class storage targets that exact workload, with hardware-accelerated paths for the most common inference operations (KV-cache lookup, agent-state checkpoint, prefill token streaming).

Primary use cases

Persistent KV-cache pools for disaggregated prefill, "instant resume" agentic state across cluster failures, per-pod inference scratchpads, shared-tenant KV-cache fabric, inference-aware data placement targets.

Recent developments

Latest signals

NVIDIA standardized ICMS / CMX as a new G3.5 tier (Ethernet-attached flash for KV cache). ICMS introduces tier 3.5 between local SSD (Tier 3) and cold S3 (Tier 4), optimized for KV-cache + multi-step MoE inference. NVIDIA + storage partners ratified the framing in Q1 2026. Per NVIDIA Technical Blog — BlueField-4-Powered CMX Context Memory Storage Platform and Blocks & Files — Nvidia's basic context memory extension infrastructure.
Storage-partner roster: VAST, DDN, Dell, HPE, Pure Storage, WEKA. Six major enterprise-storage vendors signed up to ship ICMS-class tier-3.5 hardware. Cross-vendor adoption is what makes ICMS a market category rather than a single-vendor pitch. Per Blocks & Files — Nvidia pushes AI inference context out to NVMe SSDs.
VAST CNode on BlueField-4 demonstrated 90% inference-efficiency gain + 20× TTFT improvement. Real numbers: zero-copy from remote SSD to GPU memory via virtio-fs control plane; KV-cache offload via VAST Undivided Attention (VUA, open-source). Per VAST Data — More Inference, Less Infrastructure: VAST and NVIDIA.
Solidigm productizing CMX as a distinct SSD SKU. Solidigm explicitly positions CMX as a "different SSD product class" — media tuned for the mixed read/write + bursty access pattern of inference context, not the steady-state read-heavy pattern of training data. SSD market segmentation is fragmenting around AI workload shapes. Per Solidigm — ICMS: AI Inference is a Flash Storage Problem.
NetApp introduced the "new memory tier for AI" framing in industry terms. Community-discussion-level adoption signal: NetApp + storage-industry press now talk about ICMS as the canonical 4th memory tier, not just an NVIDIA marketing label. Per NetApp Community — Introducing a New Memory Tier for AI: Inference Context Memory Storage.
Ship timeline: ICMS partner hardware expected H2 2026. Hardware partners (DDN, VAST, Dell, etc.) begin shipping ICMS-class systems in the second half of 2026 — substantially aligned with the NVIDIA Vera Rubin + BlueField-4 STX rollout window. Per Blocks & Files — Nvidia pushes inference context to NVMe.