Object Listing Performance
The slowness and cost of listing large numbers of objects in S3's flat namespace using prefix-based scans. Paginated at 1,000 objects per request.
Summary
The slowness and cost of listing large numbers of objects in S3's flat namespace using prefix-based scans. Paginated at 1,000 objects per request.
Object listing is the hidden bottleneck in S3 operations. Partition discovery, garbage collection, and table snapshots all start with listing — and at millions of objects, LIST calls dominate job startup time. Originates from: **S3 API**.
- S3 prefixes are not directories. A prefix scan does not benefit from directory-like structure — it is a linear scan filtered server-side.
- S3 Inventory (an offline listing report) is often better than real-time LIST for large-scale enumeration. But Inventory has a 24-48 hour delay.
- AWS S3
constrained_byObject Listing Performance — inherent API limitation - DuckDB, Trino
constrained_byObject Listing Performance — query engines pay the listing cost - Table formats reduce listing dependency by maintaining manifests, but metadata itself must be listed
scoped_toS3, Object Storage
Definition
The slowness and cost of listing large numbers of objects in S3's flat namespace using prefix-based scans.
Connections 9
Outbound 2
scoped_to2Inbound 7
Resources 3
Official AWS API reference for ListObjectsV2, documenting the 1,000-object-per-request limit and pagination mechanisms that constrain listing performance.
AWS's official performance design patterns covering S3 Inventory as an alternative to listing, prefix parallelization, and caching strategies for large-scale object enumeration.
Deep engineering investigation into why S3 ListObjects can take 120+ seconds, revealing how delete markers and versioning cause severe performance degradation.