Kafka Tiered Storage
An Apache Kafka feature (KIP-405) that offloads older log segments from broker-local disks to S3-compatible object storage, extending Kafka's retention capacity without scaling broker storage proportionally.
Summary
An Apache Kafka feature (KIP-405) that offloads older log segments from broker-local disks to S3-compatible object storage, extending Kafka's retention capacity without scaling broker storage proportionally.
Kafka Tiered Storage bridges the gap between real-time event streaming and long-term S3 storage. By transparently moving cold log segments to S3, it allows Kafka to serve as both the streaming platform and a long-retention event archive, reducing the need for separate S3 sink connectors for archival.
- Tiered storage does not eliminate the need for local disk entirely. Recent (hot) data still resides on broker disks for low-latency consumption. Broker local storage is still required for active segments.
- Reading from the tiered (S3) tier has higher latency than reading from local disk. Consumer applications that replay old data will experience S3 GET latency.
- Not all Kafka distributions implement KIP-405 identically. Confluent's implementation differs from Apache Kafka's in configuration and maturity.
scoped_toS3, Object Storage — offloads Kafka log segments to S3enablesEvent-Driven Ingestion — long-retention event streams without broker scalingused_byDebezium — CDC events benefit from extended retention on S3relates_toTiered Storage — Kafka-specific instance of the tiered storage pattern
Definition
A Kafka feature (KIP-405) that offloads older log segments from local broker disks to S3-compatible object storage, enabling virtually unlimited retention without scaling broker storage.
Kafka brokers traditionally store all retained data on local disk, forcing a tradeoff between retention period and disk cost. Tiered storage breaks this constraint by moving cold segments to S3, keeping only hot data on fast local storage.
Long-term Kafka log retention on S3, cost-effective event replay from object storage, decoupling Kafka retention from broker disk capacity.
Connections 10
Outbound 7
scoped_to2depends_on1enables1solves1alternative_to2Inbound 3
alternative_to2depends_on1Resources 3
KIP-405 is the accepted proposal defining Kafka's tiered storage architecture for offloading log segments to S3 and other object stores.
Confluent's production documentation for tiered storage, the most mature implementation of Kafka-to-S3 log offloading.
Apache Kafka source repository containing the tiered storage implementation for remote log segment management on object storage.