Guide 18

The Single-Digit Millisecond S3

Problem Framing

The 2025-2026 S3 ecosystem has shattered the assumption that object storage is inherently slow. S3 Express One Zone delivers single-digit millisecond first-byte latency. RDMA-accelerated storage eliminates TCP overhead for AI training. S3 Directory Buckets remove the listing bottleneck. Together, these technologies close the performance gap between object storage and local NVMe for specific workload patterns. Engineers building latency-sensitive AI, analytics, and caching workloads need to understand which performance levers exist, how they compose, and what trade-offs they carry.

Relevant Nodes

Topics: S3, Object Storage
Technologies: S3 Express One Zone, AWS S3, Apache Spark, Trino
Standards: S3 API, S3 Directory Bucket
Architectures: Separation of Storage and Compute, Lakehouse Architecture
Pain Points: Cold Scan Latency, Object Listing Performance, Vendor Lock-In

Decision Path

S3 Express One Zone — the latency lever:
- Delivers single-digit ms first-byte latency (vs. ~100ms for S3 Standard).
- Requires S3 Directory Buckets (hierarchical namespace, single-AZ).
- Use for: ML checkpointing, Spark/Trino cache tier, interactive analytics hot data.
- Skip when: you need multi-AZ durability, or your workload is throughput-bound (Express optimizes latency, not bandwidth).
S3 Directory Buckets — the metadata lever:
- Hierarchical namespace with actual directory structure. LIST operations are fast and scoped.
- Eliminates the prefix-scan bottleneck that limits S3 Standard at millions of objects.
- Required for Express One Zone. Also beneficial independently for workloads with deep hierarchies.
RDMA for S3 — the I/O lever:
- Remote Direct Memory Access (RoCE v2) bypasses TCP/IP stack entirely.
- Reduces latency from milliseconds to microseconds for GPU-to-storage transfers.
- Requires specific hardware: RDMA-capable NICs, compatible switches, supported S3 providers.
- Use for: AI training clusters where data loading is the bottleneck, high-frequency analytics.
Composing the performance stack:
- Tier 1 (most accessible): S3 Express One Zone as a cache tier for S3 Standard. Copy hot data to Express, query from there, keep cold data in Standard. Spark benchmarks show 38% runtime reduction.
- Tier 2 (moderate complexity): Add Directory Buckets for metadata-heavy workloads with millions of objects.
- Tier 3 (specialized): RDMA-accelerated storage for AI training nodes. Requires infrastructure investment but eliminates the I/O bottleneck.
Cost-performance framework:
- S3 Express is ~5-8x more expensive per GB than S3 Standard. Use it as a cache, not primary storage.
- Directory Buckets have different pricing (per-request + per-GB, no free tier). Model costs before migration.
- RDMA requires hardware investment. ROI is clearest for large-scale AI training where data loading time directly impacts GPU utilization.

What Changed Over Time

S3 Standard was designed for durability and cost, not latency. First-byte latency of ~100ms was accepted as the price of scale.
S3 Express One Zone (2023) was the first acknowledgment that object storage needed a latency-optimized tier.
S3 Directory Buckets introduced hierarchical namespaces, breaking the flat-namespace assumption that had defined S3 since 2006.
RDMA-accelerated S3 (2025-2026) represents the convergence of high-performance computing and object storage, driven by AI training demand.
The pattern: S3 is fragmenting into specialized tiers (durability-optimized, latency-optimized, throughput-optimized) rather than remaining a single general-purpose service.

Problem Framing

Relevant Nodes

Decision Path

What Changed Over Time

Sources