Cold Scan Latency
Slow first-query performance against S3-stored data, caused by object discovery, metadata fetching, and data transfer over HTTP.
Summary
Slow first-query performance against S3-stored data, caused by object discovery, metadata fetching, and data transfer over HTTP.
Cold scan latency is the fundamental performance trade-off of the separation of storage and compute pattern. Every query against S3 starts with network overhead that does not exist when querying local disk.
- Cold scan latency is not the same as S3 being slow. S3 throughput is high, but initial latency per request is ~50-100ms. For queries touching many files, this adds up.
- Caching helps with repeat queries but not with the first query. True cold scan mitigation requires metadata-driven pruning (table formats) and intelligent prefetching.
- Apache Parquet
solvesCold Scan Latency — columnar layout enables predicate pushdown - Lakehouse Architecture, Hybrid S3 + Vector Index
solvesCold Scan Latency — metadata-driven access - Separation of Storage and Compute
constrained_byCold Scan Latency — inherent trade-off - StarRocks
constrained_byCold Scan Latency — first-query limited by S3 access scoped_toS3, Object Storage
Definition
The delay experienced on the first query against S3-stored data, caused by object discovery (listing), metadata fetching, and data transfer over the network.
Recent developments
- CloudTS (FAST 2026) — compacted timeseries metadata to reduce access amplification. Per the USENIX FAST 2026 paper "An Efficient Cloud Storage Model with Compacted Timeseries Metadata", CloudTS proposes separately managing metadata and data on cloud storage to reduce access amplification on cold queries — concrete research-grade evidence that the cold-scan-latency problem is being attacked at the data-model layer, not just by caching.
- Apache Doris 4.1 — 90% object-storage cost reduction with cold-query optimization. Per VeloDB's Apache Doris 4.1 announcement, Doris 4.1 delivers unified storage and retrieval for AI workloads with cold-query optimization, reporting 90% object-storage cost reduction. The pattern across engines in 2026: Cold Scan Latency is increasingly mitigated through tighter coupling between table-format metadata and engine-side caching rather than per-engine NVMe-cache layers alone.
- Practitioner mitigation: warm cache sized at 8GB RAM + 140GB NVMe per node holds weeks of timeseries data. Per OpenData's Prometheus-on-object-storage write-up, on an r5d.xlarge node, 8 GB RAM plus 140 GB NVMe disk cache keeps several weeks of timeseries data locally warm; cold queries pay a 10–100ms object-store round-trip. This is the empirical floor: cold-scan latency on warm-cached data approaches local NVMe; on truly cold data it remains tens-of-milliseconds at best.
Connections 41
Outbound 2
scoped_to2Inbound 39
solves36Resources 2
AWS's official S3 performance optimization guide covering request parallelization, prefix design, and throughput targets.
AWS documentation on S3 Intelligent-Tiering, explaining how automatic tier transitions affect retrieval latency for infrequently accessed data.