Standard

RDMA (RoCE v2 / InfiniBand)

A network transport protocol for direct memory-to-memory data transfer between machines, bypassing the operating system kernel and CPU for minimal latency and maximum throughput. In early 2026, NVIDIA shipped RDMA client and server libraries for S3-compatible storage as part of the CUDA Toolkit, marking the transition from niche technical preview to standard "AI Factory" infrastructure.

6 connections 2 resources

Summary

What it is

Where it fits

RDMA is the high-performance network fabric used by storage systems that need microsecond-level access. Object storage systems serving AI/ML workloads use RDMA to achieve storage access times that approach local NVMe, enabling GPU-direct data paths. The NVIDIA CUDA Toolkit integration means GPU clusters can now access S3-compatible storage over RDMA without custom driver work.

Misconceptions / Traps

RDMA requires specialized network infrastructure. RoCE v2 works on lossless Ethernet (requires PFC/ECN configuration); InfiniBand requires dedicated switches and HCAs.
RDMA performance is highly sensitive to network configuration. Incorrect QoS, PFC, or ECN settings cause performance worse than standard TCP.
The NVIDIA CUDA Toolkit RDMA libraries target S3-compatible storage specifically; not all object stores support the required RDMA transport yet.

Key Connections

enables RDMA-Accelerated Object Access — the transport protocol for microsecond object access
enables GPU-Direct Storage Pipeline — direct storage-to-GPU data path
scoped_to Object Storage — underlying transport for high-performance storage

Definition

What it is

Network transport protocols enabling Remote Direct Memory Access — transferring data directly between application memory on different servers without involving the CPU or OS kernel, achieving microsecond-level latency. In early 2026, NVIDIA released RDMA client and server libraries for S3-compatible storage as part of the CUDA Toolkit, transitioning RDMA from niche technical preview to a standard component of "AI Factory" infrastructure.

Why it exists

HTTP/TCP-based S3 access introduces millisecond-scale latency. For internal object storage data paths (inter-node replication, erasure coding reconstruction), RDMA eliminates protocol overhead, enabling storage fabric performance closer to local memory access.

Primary use cases

High-performance inter-node replication, erasure-coding reconstruction acceleration, AI/ML storage fabric, low-latency data movement within storage clusters, NVIDIA CUDA Toolkit GPU-direct S3 access.

Recent developments

Latest signals

Ethernet (RoCE v2) winning the AI fabric war — ~70% of new deployments. Broadcom's March 2026 earnings confirmed ~70% of new AI infrastructure deployments are choosing Ethernet-based fabrics over InfiniBand. Meta, Microsoft, AWS converging on RoCE v2 for operational reasons (existing Ethernet skills + open-vendor sourcing). Per Rack2Cloud — InfiniBand vs RoCEv2: Why Ethernet Is Winning and NetPilot — RoCEv2 vs InfiniBand: AI Cluster Networking Compared 2026.
Performance gap narrowed: InfiniBand 1–2µs vs RoCE v2 2–5µs. RoCE v2 hits 85–95% of InfiniBand's training throughput for tier-2/3 deployments (256–1,024 GPUs). InfiniBand still wins at frontier scale (>10K GPUs) where the latency delta compounds. Per FirstPassLab — RoCE vs InfiniBand for AI Data Center Networking 2026.
Ultra Ethernet Consortium (UEC) emerging as the third option. UEC builds next-gen Ethernet specifically for AI workloads with built-in reliability that eliminates PFC entirely + adaptive AI-tuned congestion control + native RDMA. Production deployments expected H2 2026 onward. Per Stordis — Ultra Ethernet vs InfiniBand, RoCE and TCP - AI and Medium — From InfiniBand to Ultra Ethernet: Why AI Networks Rethought RDMA.
Lossless Ethernet (PFC + ECN) is the table-stakes RoCE v2 deployment pattern. Production RoCE v2 requires Priority Flow Control + Explicit Congestion Notification across the fabric — getting this configuration right is the load-bearing operational task that separates working RoCE v2 deployments from broken ones. Per Intelligent Visibility — Lossless Ethernet Design Guide for AI Fabrics 2026.
iWARP is dead; RoCE v2 + InfiniBand + UEC are the three options. Intelligent Visibility's RDMA-for-Storage guide explicitly retires iWARP from the production option set — the third option that competed with RoCE v2 and InfiniBand a decade ago has effectively zero new deployments. Per Intelligent Visibility — RDMA for Storage Ethernet: RoCE vs iWARP.
RDMA-for-S3 is the convergence point — cuObject, MinIO AIStor, Cloudian, VAST all wire it via RoCE v2 or InfiniBand. Cross-vendor signal: every major S3-RDMA implementation in 2026 sits on top of RoCE v2 or InfiniBand. The protocol layer is settled; the storage-vendor implementations are where the differentiation lives. Per Distributed AI Fabrics — InfiniBand, RDMA, Lossless Ethernet Strategy Guide.

Connections 6

Outbound 2

scoped_to1

Object Storage

enables1

RDMA-Accelerated Object Access

Inbound 4

implements1

NVIDIA GPUDirect RDMA for S3

depends_on3

NVIDIA GPUDirect RDMA for S3 Cloudian HyperStore RDMA-Accelerated Object Access

Resources 2

DocsHigh

www.nvidia.com/en-us/networking/

NVIDIA networking solutions page covering InfiniBand and RoCE products for RDMA-accelerated data center communication.

PaperHigh

developer.nvidia.com/networking

Authoritative whitepaper on deploying RoCE v2 in data centers, covering lossless Ethernet configuration and performance analysis.