Data Loading Bottleneck

Summary

What it is

The phenomenon where AI training and inference workloads sit GPU-idle waiting on object storage to deliver the next batch of training data, checkpoints, or RAG retrieval results — turning a compute-bound workload into a storage-bound one.

Where it fits

Distinct from **Cold Scan Latency** (first-query analytics latency) and from **Legacy Ingestion Bottlenecks** (ETL throughput). This is specifically about steady-state read throughput from S3 to GPU HBM during training. Empirically the dominant cost driver in 2026 AI infrastructure: ~80% of training wall-clock at hyperscaler workloads, ~35% of compute time wasted before GPUDirect Storage 2.0 deployment at Meta.

Misconceptions / Traps

A high p50 GET latency does not by itself cause this — what kills GPU utilization is p99 tail latency in synchronous training loops where the slowest worker gates the next step.
"Just use Express One Zone" is half the answer. Express One Zone reduces first-byte latency, but throughput per GPU still depends on how data flows from S3 to GPU HBM (CPU bounce vs RDMA vs cache).
Profiling tools must be GPU-aware. Looking at S3 metrics alone hides the bottleneck — nvidia-smi data-vs-compute breakdown is the diagnostic that actually identifies it.

Key Connections

GPU-Direct Storage Pipeline solves Data Loading Bottleneck
NVIDIA GPUDirect RDMA for S3 solves Data Loading Bottleneck
Alluxio solves Data Loading Bottleneck — the cache-tier answer
Tiered Storage solves Data Loading Bottleneck — when paired with NVMe scratch
scoped_to Object Storage for AI Data Pipelines, S3

Definition

What it is

The phenomenon where AI training and inference workloads sit GPU-idle waiting on object storage to deliver the next batch of training data, checkpoints, or RAG retrieval results — turning a compute-bound workload into a storage-bound one. Distinct from **Cold Scan Latency** (first-query latency on analytics) and from **Legacy Ingestion Bottlenecks** (ETL throughput): this is specifically about **steady-state read throughput from S3 to GPU HBM during a training run**.

Connections 7

Outbound 3

scoped_to3

Object Storage Object Storage for AI Data Pipelines S3

Inbound 4

solves4

Alluxio NVIDIA GPUDirect RDMA for S3 Tiered Storage GPU-Direct Storage Pipeline

Resources 3

BlogMedium

introl.com/blog/object-storage-ai-gpu-direct-storage-200gb-t...

Quantifies the end-state of solving this pain point — 192–200 GB/s sustained throughput from S3 to GPU memory via GPUDirect Storage 2.0.

DocsHigh

www.solidigm.com/products/technology/accelerating-ai-with-hi...

Solidigm's analysis of the storage-to-GPU pipeline bottleneck — independently corroborates the ~80% data-loading wall-clock figure that motivates the entire AI-data-pipeline architecture cluster.

DocsHigh

www.alluxio.io/

Alluxio's case-study material with the 10× GPU-loading benchmarks from Uber, Shopee, and AliPay deployments — primary evidence that this is a real, quantifiable pain point.

Summary

Definition

Connections 7

Resources 3

Featured in