Pain Point

Cold Scan Latency

Slow first-query performance against S3-stored data, caused by object discovery, metadata fetching, and data transfer over HTTP.

29 connections 2 resources

Summary

What it is

Slow first-query performance against S3-stored data, caused by object discovery, metadata fetching, and data transfer over HTTP.

Where it fits

Cold scan latency is the fundamental performance trade-off of the separation of storage and compute pattern. Every query against S3 starts with network overhead that does not exist when querying local disk.

Misconceptions / Traps
  • Cold scan latency is not the same as S3 being slow. S3 throughput is high, but initial latency per request is ~50-100ms. For queries touching many files, this adds up.
  • Caching helps with repeat queries but not with the first query. True cold scan mitigation requires metadata-driven pruning (table formats) and intelligent prefetching.
Key Connections
  • Apache Parquet solves Cold Scan Latency — columnar layout enables predicate pushdown
  • Lakehouse Architecture, Hybrid S3 + Vector Index solves Cold Scan Latency — metadata-driven access
  • Separation of Storage and Compute constrained_by Cold Scan Latency — inherent trade-off
  • StarRocks constrained_by Cold Scan Latency — first-query limited by S3 access
  • scoped_to S3, Object Storage

Definition

What it is

The delay experienced on the first query against S3-stored data, caused by object discovery (listing), metadata fetching, and data transfer over the network.

Connections 29

Outbound 2
Inbound 27click to expand

Resources 2