Data Loading Bottleneck
The phenomenon where AI training and inference workloads sit GPU-idle waiting on object storage to deliver the next batch of training data, checkpoints, or RAG retrieval results — turning a compute-bound workload into a storage-bound one.
Summary
The phenomenon where AI training and inference workloads sit GPU-idle waiting on object storage to deliver the next batch of training data, checkpoints, or RAG retrieval results — turning a compute-bound workload into a storage-bound one.
Distinct from **Cold Scan Latency** (first-query analytics latency) and from **Legacy Ingestion Bottlenecks** (ETL throughput). This is specifically about steady-state read throughput from S3 to GPU HBM during training. Empirically the dominant cost driver in 2026 AI infrastructure: ~80% of training wall-clock at hyperscaler workloads, ~35% of compute time wasted before GPUDirect Storage 2.0 deployment at Meta.
- A high p50 GET latency does not by itself cause this — what kills GPU utilization is p99 tail latency in synchronous training loops where the slowest worker gates the next step.
- "Just use Express One Zone" is half the answer. Express One Zone reduces first-byte latency, but throughput per GPU still depends on how data flows from S3 to GPU HBM (CPU bounce vs RDMA vs cache).
- Profiling tools must be GPU-aware. Looking at S3 metrics alone hides the bottleneck —
nvidia-smidata-vs-compute breakdown is the diagnostic that actually identifies it.
- GPU-Direct Storage Pipeline
solvesData Loading Bottleneck - NVIDIA GPUDirect RDMA for S3
solvesData Loading Bottleneck - Alluxio
solvesData Loading Bottleneck — the cache-tier answer - Tiered Storage
solvesData Loading Bottleneck — when paired with NVMe scratch scoped_toObject Storage for AI Data Pipelines, S3
Definition
The phenomenon where AI training and inference workloads sit GPU-idle waiting on object storage to deliver the next batch of training data, checkpoints, or RAG retrieval results — turning a compute-bound workload into a storage-bound one. Distinct from **Cold Scan Latency** (first-query latency on analytics) and from **Legacy Ingestion Bottlenecks** (ETL throughput): this is specifically about **steady-state read throughput from S3 to GPU HBM during a training run**.
Connections 7
Outbound 3
Resources 3
Quantifies the end-state of solving this pain point — 192–200 GB/s sustained throughput from S3 to GPU memory via GPUDirect Storage 2.0.
Solidigm's analysis of the storage-to-GPU pipeline bottleneck — independently corroborates the ~80% data-loading wall-clock figure that motivates the entire AI-data-pipeline architecture cluster.
Alluxio's case-study material with the 10× GPU-loading benchmarks from Uber, Shopee, and AliPay deployments — primary evidence that this is a real, quantifiable pain point.