Manifest Pruning
The optimization technique used by table formats (especially Iceberg) to skip reading irrelevant manifest files during query planning by using upper-level metadata (manifest lists) to eliminate manifests whose data files cannot match the query predicates.
Summary
The optimization technique used by table formats (especially Iceberg) to skip reading irrelevant manifest files during query planning by using upper-level metadata (manifest lists) to eliminate manifests whose data files cannot match the query predicates.
Manifest pruning is a critical performance optimization for large Iceberg tables on S3. Without it, query planning requires reading every manifest file (one S3 GET per manifest), which at scale can mean thousands of requests before a single data file is read.
- Manifest pruning effectiveness depends on data organization. If data for a given predicate value is spread across all manifests (poor clustering), pruning eliminates nothing.
- Manifest pruning operates on partition-level bounds stored in the manifest list. It does not use column-level min/max statistics — that happens at the data file level during file pruning.
- Adding too many partitions increases the number of manifests. Partition design directly affects manifest pruning efficiency.
solvesMetadata Overhead at Scale — reduces the number of manifest files read during planningsolvesCold Scan Latency — fewer S3 GETs during query planning means faster time-to-first-rowdepends_onClustering / Sort Order — well-organized data produces more prunable manifestsscoped_toApache Iceberg, S3 — Iceberg's metadata pruning mechanism
Definition
The practice of periodically cleaning up expired snapshots, orphaned manifests, and unreferenced data files from Iceberg, Delta, or Hudi tables on S3 to reclaim storage and reduce metadata scan overhead.
Table formats accumulate metadata over time — each commit creates new manifest files, and time-travel retention keeps old snapshots. Without pruning, metadata growth degrades query planning performance and inflates S3 storage costs.
Iceberg snapshot expiration and orphan file cleanup, Delta VACUUM operations, metadata size management for high-frequency write tables.
Connections 5
Outbound 5
scoped_to2solves1enables1constrained_by1Resources 3
Iceberg specification section on manifest lists defining the metadata structure that enables partition pruning and manifest-level file skipping.
Iceberg performance guide covering scan planning, manifest pruning, and predicate pushdown for minimizing S3 reads.
Delta Lake data skipping documentation explaining how file-level statistics enable manifest-level pruning on S3.