RDMA (RoCE v2 / InfiniBand)
A network transport protocol for direct memory-to-memory data transfer between machines, bypassing the operating system kernel and CPU for minimal latency and maximum throughput. In early 2026, NVIDIA shipped RDMA client and server libraries for S3-compatible storage as part of the CUDA Toolkit, marking the transition from niche technical preview to standard "AI Factory" infrastructure.
Summary
A network transport protocol for direct memory-to-memory data transfer between machines, bypassing the operating system kernel and CPU for minimal latency and maximum throughput. In early 2026, NVIDIA shipped RDMA client and server libraries for S3-compatible storage as part of the CUDA Toolkit, marking the transition from niche technical preview to standard "AI Factory" infrastructure.
RDMA is the high-performance network fabric used by storage systems that need microsecond-level access. Object storage systems serving AI/ML workloads use RDMA to achieve storage access times that approach local NVMe, enabling GPU-direct data paths. The NVIDIA CUDA Toolkit integration means GPU clusters can now access S3-compatible storage over RDMA without custom driver work.
- RDMA requires specialized network infrastructure. RoCE v2 works on lossless Ethernet (requires PFC/ECN configuration); InfiniBand requires dedicated switches and HCAs.
- RDMA performance is highly sensitive to network configuration. Incorrect QoS, PFC, or ECN settings cause performance worse than standard TCP.
- The NVIDIA CUDA Toolkit RDMA libraries target S3-compatible storage specifically; not all object stores support the required RDMA transport yet.
enablesRDMA-Accelerated Object Access — the transport protocol for microsecond object accessenablesGPU-Direct Storage Pipeline — direct storage-to-GPU data pathscoped_toObject Storage — underlying transport for high-performance storage
Definition
Network transport protocols enabling Remote Direct Memory Access — transferring data directly between application memory on different servers without involving the CPU or OS kernel, achieving microsecond-level latency. In early 2026, NVIDIA released RDMA client and server libraries for S3-compatible storage as part of the CUDA Toolkit, transitioning RDMA from niche technical preview to a standard component of "AI Factory" infrastructure.
HTTP/TCP-based S3 access introduces millisecond-scale latency. For internal object storage data paths (inter-node replication, erasure coding reconstruction), RDMA eliminates protocol overhead, enabling storage fabric performance closer to local memory access.
High-performance inter-node replication, erasure-coding reconstruction acceleration, AI/ML storage fabric, low-latency data movement within storage clusters, NVIDIA CUDA Toolkit GPU-direct S3 access.
Connections 3
Outbound 2
scoped_to1enables1Inbound 1
depends_on1Resources 2
NVIDIA networking solutions page covering InfiniBand and RoCE products for RDMA-accelerated data center communication.
Authoritative whitepaper on deploying RoCE v2 in data centers, covering lossless Ethernet configuration and performance analysis.