Pain Point

Read / Write Amplification

The ratio between the logical data volume involved in an operation and the actual bytes read from or written to S3, arising from immutable file formats, copy-on-write semantics, and metadata overhead inherent in S3-based table formats.

7 connections 3 resources

Summary

What it is

Where it fits

Read/write amplification quantifies the hidden I/O cost of operations on S3-based lakehouses. A single row update in Iceberg's copy-on-write mode rewrites an entire data file (write amplification); a query that needs 100 rows may read entire Parquet row groups (read amplification). Both inflate S3 costs and latency.

Misconceptions / Traps

Merge-on-read (Iceberg, Hudi MOR) reduces write amplification by deferring rewrites but increases read amplification because delete files must be applied at query time. The tradeoff shifts cost from writers to readers.
Parquet's columnar format reduces read amplification for column-selective queries but not for row-selective queries. Reading one row still requires reading the entire row group.
Compaction reduces read amplification (fewer files to scan) but temporarily increases write amplification (rewriting files). The net effect depends on the read/write ratio of the workload.

Key Connections

scoped_to Table Formats, S3 — I/O amplification in S3-based tables
amplifies Request Pricing Models — amplified I/O means amplified request costs
constrains Cold Scan Latency — read amplification increases scan time
relates_to Compaction — compaction trades write amplification for reduced read amplification

Definition

What it is

The ratio of actual bytes read from or written to S3 versus the logical bytes needed by the operation. Copy-on-write table formats and compaction strategies can amplify physical I/O well beyond the logical change size.

Recent developments

Latest signals

LSM-tree write amplification can hit 10× at Lk-1 → Lk merge. In traditional LSM engines (LevelDB, RocksDB), promoting one file from level Lk-1 to Lk may require reading + writing 10 files in the worst case — write amplification ~10×. The structural cost of LSM-tree compaction that all LSM-based table formats inherit. Per Alibaba Cloud — An In-depth Discussion on the LSM Compaction Mechanism.
SSD wear-out is the second-order failure mode driven by write amplification. SSDs have a limited number of program/erase cycles; LSM compaction's high write amplification accelerates wear-out + failure. The SSD-replacement-cost line item is the often-uncounted half of the write-amplification bill. Per Medium — Deep Dive on Read, Write, and Space Amplification in SSDs and LSM Storage Engines.
2026 research direction: identifying "Unchanged Data Blocks" during compaction. Recent research observes that many data blocks are simply read + written back without alteration during LSM compaction — these are pure write-amplification waste. Pattern-matching out these blocks at compaction time eliminates unnecessary writes. Per IEEE — RemapCom: Optimizing Compaction Performance of LSM Trees via Data Block Remapping in SSDs.
Hybrid NVM-SSD compaction: fine-grained on NVM tier, selective on SSD tier. 2026 hybrid-compaction approach: fine-grained compaction for NVM (where write cost is low + write endurance is high) + selective compaction for SSD (where write cost is high + endurance is limited). Improves end-to-end efficiency + alleviates write amplification + write stalls. Per Springer — Reducing Write Stall and Write Amplification for LSM-Tree KV Stores Using Hybrid Compression.
ZNS-SSD-aware LSM compaction (USENIX HotStorage 2022) remains the 2026 reference for write-aware design. Lifetime-leveling LSM-tree compaction for ZNS SSDs aligns LSM zone-aligned writes with SSD's underlying zone structure — eliminates the write-amplification overhead of FTL-style indirection on traditional SSDs. Per ACM HotStorage — Lifetime-Leveling LSM-Tree Compaction for ZNS SSD.
GPU-accelerated LSM compaction emerges as 2024-2025 acceleration path. MSST 2024 paper on GPU-accelerated compaction strategy for LSM-based KV stores — uses GPU parallelism for the compaction-heavy work, freeing CPU + reducing compaction-stall time. Pattern is starting to show up in production-grade KV stores. Per MSST — GPU-Accelerated Compaction Strategy for LSM-Based KV Store.