Technology

Kafka Tiered Storage

An Apache Kafka feature (KIP-405) that offloads older log segments from broker-local disks to S3-compatible object storage, extending Kafka's retention capacity without scaling broker storage proportionally.

10 connections 3 resources

Summary

What it is

Where it fits

Kafka Tiered Storage bridges the gap between real-time event streaming and long-term S3 storage. By transparently moving cold log segments to S3, it allows Kafka to serve as both the streaming platform and a long-retention event archive, reducing the need for separate S3 sink connectors for archival.

Misconceptions / Traps

Tiered storage does not eliminate the need for local disk entirely. Recent (hot) data still resides on broker disks for low-latency consumption. Broker local storage is still required for active segments.
Reading from the tiered (S3) tier has higher latency than reading from local disk. Consumer applications that replay old data will experience S3 GET latency.
Not all Kafka distributions implement KIP-405 identically. Confluent's implementation differs from Apache Kafka's in configuration and maturity.

Key Connections

scoped_to S3, Object Storage — offloads Kafka log segments to S3
enables Event-Driven Ingestion — long-retention event streams without broker scaling
used_by Debezium — CDC events benefit from extended retention on S3
relates_to Tiered Storage — Kafka-specific instance of the tiered storage pattern

Definition

What it is

A Kafka feature (KIP-405) that offloads older log segments from local broker disks to S3-compatible object storage, enabling virtually unlimited retention without scaling broker storage.

Why it exists

Kafka brokers traditionally store all retained data on local disk, forcing a tradeoff between retention period and disk cost. Tiered storage breaks this constraint by moving cold segments to S3, keeping only hot data on fast local storage.

Primary use cases

Long-term Kafka log retention on S3, cost-effective event replay from object storage, decoupling Kafka retention from broker disk capacity.