Standard

RDMA (RoCE v2 / InfiniBand)

A network transport protocol for direct memory-to-memory data transfer between machines, bypassing the operating system kernel and CPU for minimal latency and maximum throughput. In early 2026, NVIDIA shipped RDMA client and server libraries for S3-compatible storage as part of the CUDA Toolkit, marking the transition from niche technical preview to standard "AI Factory" infrastructure.

3 connections 2 resources

Summary

What it is

A network transport protocol for direct memory-to-memory data transfer between machines, bypassing the operating system kernel and CPU for minimal latency and maximum throughput. In early 2026, NVIDIA shipped RDMA client and server libraries for S3-compatible storage as part of the CUDA Toolkit, marking the transition from niche technical preview to standard "AI Factory" infrastructure.

Where it fits

RDMA is the high-performance network fabric used by storage systems that need microsecond-level access. Object storage systems serving AI/ML workloads use RDMA to achieve storage access times that approach local NVMe, enabling GPU-direct data paths. The NVIDIA CUDA Toolkit integration means GPU clusters can now access S3-compatible storage over RDMA without custom driver work.

Misconceptions / Traps
  • RDMA requires specialized network infrastructure. RoCE v2 works on lossless Ethernet (requires PFC/ECN configuration); InfiniBand requires dedicated switches and HCAs.
  • RDMA performance is highly sensitive to network configuration. Incorrect QoS, PFC, or ECN settings cause performance worse than standard TCP.
  • The NVIDIA CUDA Toolkit RDMA libraries target S3-compatible storage specifically; not all object stores support the required RDMA transport yet.
Key Connections
  • enables RDMA-Accelerated Object Access — the transport protocol for microsecond object access
  • enables GPU-Direct Storage Pipeline — direct storage-to-GPU data path
  • scoped_to Object Storage — underlying transport for high-performance storage

Definition

What it is

Network transport protocols enabling Remote Direct Memory Access — transferring data directly between application memory on different servers without involving the CPU or OS kernel, achieving microsecond-level latency. In early 2026, NVIDIA released RDMA client and server libraries for S3-compatible storage as part of the CUDA Toolkit, transitioning RDMA from niche technical preview to a standard component of "AI Factory" infrastructure.

Why it exists

HTTP/TCP-based S3 access introduces millisecond-scale latency. For internal object storage data paths (inter-node replication, erasure coding reconstruction), RDMA eliminates protocol overhead, enabling storage fabric performance closer to local memory access.

Primary use cases

High-performance inter-node replication, erasure-coding reconstruction acceleration, AI/ML storage fabric, low-latency data movement within storage clusters, NVIDIA CUDA Toolkit GPU-direct S3 access.

Connections 3

Outbound 2
Inbound 1

Resources 2