Architecture

Deletion Vector

A metadata pattern that tracks which rows in a data file have been logically deleted or updated, using a compact bitmap instead of rewriting the entire file.

5 connections 2 resources

Summary

What it is

A metadata pattern that tracks which rows in a data file have been logically deleted or updated, using a compact bitmap instead of rewriting the entire file.

Where it fits

Deletion vectors are the key mechanism that makes merge-on-read (MoR) practical for lakehouse formats on S3. Instead of the expensive copy-on-write approach (rewriting a 128MB Parquet file to delete one row), a tiny deletion vector file marks the invalidated rows. Query engines skip those rows at read time, and periodic compaction reconciles the deletes.

Misconceptions / Traps
  • Deletion vectors improve write performance at the cost of read performance. Queries must check deletion vectors for every data file, adding overhead until compaction runs.
  • Not all engines support deletion vectors equally. Check your query engine's support before depending on this pattern for high-throughput reads.
Key Connections
  • enables Apache Iceberg, Delta Lake — efficient row-level operations
  • solves Small Files Problem — reduces write amplification
  • scoped_to Table Formats, S3

Definition

What it is

A metadata pattern used by lakehouse table formats to track which rows in a data file have been deleted or updated, without rewriting the entire data file. Instead of copy-on-write, a compact bitmap or vector records the positions of invalidated rows.

Why it exists

Copy-on-write (CoW) updates in lakehouse formats require rewriting entire Parquet files to delete or update a single row, causing massive write amplification. Deletion vectors enable merge-on-read (MoR) by recording row-level deletions in lightweight metadata files, dramatically reducing write costs for high-frequency update workloads.

Primary use cases

Efficient row-level deletes and updates in Iceberg and Delta Lake, high-frequency CDC ingestion with low write amplification, streaming update workloads on S3.

Recent developments

Latest signals
  • Deletion vectors are now production-ready as of Apache Iceberg 1.11.0 (May 19, 2026). Each deletion vector is a Roaring bitmap stored in the Puffin file format, maintaining a 1:1 relationship between a data file and its deletion file — the V3 mechanism that replaces V2 positional delete files. The 1.11.0 GA moves deletion vectors from experimental, flag-gated status to the stable supported path, enabling broad engine rollout (Spark 4.1, Flink 2.1, and the wider multi-engine ecosystem). Per What's New in Apache Iceberg 1.11.0 (Dremio).
  • Apache Iceberg 1.11.0 (May 2026) ships V3 deletion vectors as production-ready. Apache Iceberg 1.11.0 release marked V3 maturity — deletion vectors, Variant type, and the rest of V3 features now production-grade across the major engines. Per Dataverses — Apache Iceberg 1.11.0 Release: Deletion Vectors, Variant Type, V3 Maturity.
  • Roaring bitmaps in Puffin sidecar files — one DV per data file maximum. Each file_A.parquet is paired with one file_A.puffin containing the Roaring bitmap of deleted row positions. 64-bit-position-capable but optimized for the common case of positions fitting in 32 bits. Per Apache Iceberg — Deletion Vectors spec.
  • AWS EMR adds first-class deletion-vector support. EMR Iceberg integration uses V3 deletion vectors automatically — production-grade Spark + EMR pipelines get the V3 benefits with no config changes. Per AWS Big Data Blog — Unlock the Power of Apache Iceberg V3 Deletion Vectors on EMR.
  • CDC pipeline perf: up to 10× faster MERGE/UPDATE vs V2 position-delete files. Removes the read-amplification of V2 where queries had to join data files against delete files at query time — DV-aware engines read the bitmap once at scan start, no per-row join. Per DataLakehouseHub — What Iceberg V3 Advances Mean for CDC Pipelines.
  • MinIO AIStor Tables ships native V3 deletion-vector support on-prem. First on-prem object storage to ship Iceberg V3 deletion-vector compaction natively in the bucket — no separate compute layer needed. Per MinIO Blog — AIStor Tables: Native Iceberg V3 for On-Prem Object Storage.
  • Cross-format consensus: Iceberg V3 + Delta Lake + Hudi 1.x all support deletion vectors as soft deletes. All three major table formats now ship deletion-vector support; the architectural argument that "MoR with DVs beats CoW for high-frequency updates" is now uncontested across the table-format ecosystem. Per Olake — Comparing Delete Methods in Iceberg & Delta Lake.

Connections 5

Outbound 5

Resources 2