Pain Point

Data Loading Bottleneck

The phenomenon where AI training and inference workloads sit GPU-idle waiting on object storage to deliver the next batch of training data, checkpoints, or RAG retrieval results — turning a compute-bound workload into a storage-bound one.

7 connections 3 resources 2 posts

Summary

What it is

The phenomenon where AI training and inference workloads sit GPU-idle waiting on object storage to deliver the next batch of training data, checkpoints, or RAG retrieval results — turning a compute-bound workload into a storage-bound one.

Where it fits

Distinct from **Cold Scan Latency** (first-query analytics latency) and from **Legacy Ingestion Bottlenecks** (ETL throughput). This is specifically about steady-state read throughput from S3 to GPU HBM during training. Empirically the dominant cost driver in 2026 AI infrastructure: ~80% of training wall-clock at hyperscaler workloads, ~35% of compute time wasted before GPUDirect Storage 2.0 deployment at Meta.

Misconceptions / Traps
  • A high p50 GET latency does not by itself cause this — what kills GPU utilization is p99 tail latency in synchronous training loops where the slowest worker gates the next step.
  • "Just use Express One Zone" is half the answer. Express One Zone reduces first-byte latency, but throughput per GPU still depends on how data flows from S3 to GPU HBM (CPU bounce vs RDMA vs cache).
  • Profiling tools must be GPU-aware. Looking at S3 metrics alone hides the bottleneck — nvidia-smi data-vs-compute breakdown is the diagnostic that actually identifies it.
Key Connections
  • GPU-Direct Storage Pipeline solves Data Loading Bottleneck
  • NVIDIA GPUDirect RDMA for S3 solves Data Loading Bottleneck
  • Alluxio solves Data Loading Bottleneck — the cache-tier answer
  • Tiered Storage solves Data Loading Bottleneck — when paired with NVMe scratch
  • scoped_to Object Storage for AI Data Pipelines, S3

Definition

What it is

The phenomenon where AI training and inference workloads sit GPU-idle waiting on object storage to deliver the next batch of training data, checkpoints, or RAG retrieval results — turning a compute-bound workload into a storage-bound one. Distinct from **Cold Scan Latency** (first-query latency on analytics) and from **Legacy Ingestion Bottlenecks** (ETL throughput): this is specifically about **steady-state read throughput from S3 to GPU HBM during a training run**.

Connections 7

Outbound 3
Inbound 4

Resources 3

Featured in