Technology

Inference Context Memory Storage (ICMS)

A new storage tier — also referred to as **Context Memory eXtension (CMX)** — sitting between traditional NVMe SSDs and cold S3 buckets, specifically optimized for AI inference state. Leverages high-performance DPUs (NVIDIA BlueField-4) and DPU-attached flash to offload data placement and context retrieval at the pod level. Solidigm and other flash vendors are productizing CMX as a distinct SKU class, separate from general-purpose enterprise SSDs, with media tuned for the bursty, mixed read/write access patterns of agentic state and KV-cache offloading.

7 connections 1 post

Definition

What it is

A new storage tier — also referred to as **Context Memory eXtension (CMX)** — sitting between traditional NVMe SSDs and cold S3 buckets, specifically optimized for AI inference state. Leverages high-performance DPUs (NVIDIA BlueField-4) and DPU-attached flash to offload data placement and context retrieval at the pod level. Solidigm and other flash vendors are productizing CMX as a distinct SKU class, separate from general-purpose enterprise SSDs, with media tuned for the bursty, mixed read/write access patterns of agentic state and KV-cache offloading.

Why it exists

AI inference workloads have a specific shape that traditional storage hardware doesn't serve well — long sessions interleaved with short bursts, KV-cache state that's written once and read many times, agent memory that persists for hours-to-days but doesn't need cold-tier economics. CMX-class storage targets that exact workload, with hardware-accelerated paths for the most common inference operations (KV-cache lookup, agent-state checkpoint, prefill token streaming).

Primary use cases

Persistent KV-cache pools for disaggregated prefill, "instant resume" agentic state across cluster failures, per-pod inference scratchpads, shared-tenant KV-cache fabric, inference-aware data placement targets.

Connections 7

Outbound 6
Inbound 1

Featured in