Definition

What it is

A commercial **memory orchestration** platform for AI workloads, providing software-defined coordination of CXL-attached memory pools, GPU HBM, and CXL-connected NVMe across distributed inference clusters. MemVerge's framing: as the hardware substrate becomes more heterogeneous (HBM3e → DRAM → CXL.mem → NVMe → S3), the software layer that decides *which memory to use for which workload* becomes the load-bearing decision point. The platform exposes APIs that let inference engines request memory by characteristics (latency budget, durability requirement, capacity) rather than by hardware tier.

Why it exists

Manual memory tiering doesn't scale across the new memory hierarchy. When an inference deployment has GPUs with HBM3e, hosts with CXL-attached DRAM pools, CXL.mem-attached NVMe, and S3 buckets, the question "where should this KV-cache live" has dozens of correct answers depending on workload shape. MemVerge automates the choice based on workload telemetry.

Primary use cases

Cross-tier memory orchestration for AI inference, CXL memory pool management, KV-cache placement decisions across heterogeneous hardware, AI memory fabric coordination, per-workload memory-characteristic provisioning.

Recent developments

Latest signals

XConn + MemVerge demo at SC25 / OCP 2025: 5×+ perf vs SSD or RDMA KV-cache offload. Joint demo showed CXL memory pool used for KV-cache offload + prefill/decode disaggregation achieving >5× performance gain over SSD-based caching and RDMA-only KV-cache offloading on real AI inference workloads. Per StorageNewsletter — XConn + MemVerge: Breakthrough Scalable CXL Memory Solution for KV Cache Offload.
NVIDIA Dynamo integration validated at OCP 2025. XConn + MemVerge demonstrated CXL memory pool serving as KV-cache backing store for NVIDIA Dynamo at OCP Global Summit — the inference-framework-side interop signal that CXL pools are production-ready for KV-cache duty. Per PRWeb — XConn + MemVerge CXL Memory Pool for KV Cache with Dynamo at OCP 2025.
GISMO API: 675% remote-data-access speedup, 280% shuffle speedup on Ray. Global IO-Free Shared Memory Objects (GISMO) is MemVerge's distributed shared-memory API; benchmarks on the Ray AI framework show massive speedups vs traditional remote-IO + shuffle paths. Per MemVerge — Memory Machine CXL.
Commercial CXL memory pools at 100TiB available 2025; larger deployments planned 2026. Scale milestone: commercial CXL pools reached the 100TiB capacity point in 2025, with larger 2026 deployments in flight — the "memory pool as the new SAN" framing is now an operational reality, not a roadmap. Per StorageNewsletter — SC25 XConn + MemVerge.
"Persisting the cognitive state of agents, LLMs, microservices across workloads." MemVerge's 2026 positioning: extends memory orchestration from CXL + DPU fabrics into the orchestration plane that governs them. The framing argues memory orchestration is now an AI-runtime concern, not a hardware-management concern. Per MemVerge — Inside the AI Memory Layer.
Dynamic tiering + placement policies across HBM / DRAM / CXL.mem / NVMe / S3. Memory Machine adjudicates placement across the full memory hierarchy by workload telemetry — no per-app tiering code; the orchestrator decides per-request where data lives. Per Introl — CXL Memory Expansion + Pooling + Disaggregated Memory 2025.

Connections 5

Outbound 5

scoped_to3

Inference Locality AI Memory Infrastructure GPU + Object Storage Convergence

orchestrates1

AI Memory Infrastructure

acts_as1

AI Runtime Infrastructure

Definition

Recent developments

Connections 5

Featured in