Read / Write Amplification
The ratio between the logical data volume involved in an operation and the actual bytes read from or written to S3, arising from immutable file formats, copy-on-write semantics, and metadata overhead inherent in S3-based table formats.
Summary
The ratio between the logical data volume involved in an operation and the actual bytes read from or written to S3, arising from immutable file formats, copy-on-write semantics, and metadata overhead inherent in S3-based table formats.
Read/write amplification quantifies the hidden I/O cost of operations on S3-based lakehouses. A single row update in Iceberg's copy-on-write mode rewrites an entire data file (write amplification); a query that needs 100 rows may read entire Parquet row groups (read amplification). Both inflate S3 costs and latency.
- Merge-on-read (Iceberg, Hudi MOR) reduces write amplification by deferring rewrites but increases read amplification because delete files must be applied at query time. The tradeoff shifts cost from writers to readers.
- Parquet's columnar format reduces read amplification for column-selective queries but not for row-selective queries. Reading one row still requires reading the entire row group.
- Compaction reduces read amplification (fewer files to scan) but temporarily increases write amplification (rewriting files). The net effect depends on the read/write ratio of the workload.
scoped_toTable Formats, S3 — I/O amplification in S3-based tablesamplifiesRequest Pricing Models — amplified I/O means amplified request costsconstrainsCold Scan Latency — read amplification increases scan timerelates_toCompaction — compaction trades write amplification for reduced read amplification
Definition
The ratio of actual bytes read from or written to S3 versus the logical bytes needed by the operation. Copy-on-write table formats and compaction strategies can amplify physical I/O well beyond the logical change size.
Connections 6
Outbound 3
scoped_to3Inbound 3
constrained_by2Resources 3
Iceberg table maintenance documentation covering compaction and snapshot management to control write amplification on S3.
Hudi table types documentation comparing copy-on-write (high write amplification) versus merge-on-read (high read amplification) trade-offs.
Paimon LSM-tree documentation explaining how leveled compaction manages read/write amplification for streaming tables on S3.