Technology

Kafka Tiered Storage

An Apache Kafka feature (KIP-405) that offloads older log segments from broker-local disks to S3-compatible object storage, extending Kafka's retention capacity without scaling broker storage proportionally.

10 connections 3 resources

Summary

What it is

An Apache Kafka feature (KIP-405) that offloads older log segments from broker-local disks to S3-compatible object storage, extending Kafka's retention capacity without scaling broker storage proportionally.

Where it fits

Kafka Tiered Storage bridges the gap between real-time event streaming and long-term S3 storage. By transparently moving cold log segments to S3, it allows Kafka to serve as both the streaming platform and a long-retention event archive, reducing the need for separate S3 sink connectors for archival.

Misconceptions / Traps
  • Tiered storage does not eliminate the need for local disk entirely. Recent (hot) data still resides on broker disks for low-latency consumption. Broker local storage is still required for active segments.
  • Reading from the tiered (S3) tier has higher latency than reading from local disk. Consumer applications that replay old data will experience S3 GET latency.
  • Not all Kafka distributions implement KIP-405 identically. Confluent's implementation differs from Apache Kafka's in configuration and maturity.
Key Connections
  • scoped_to S3, Object Storage — offloads Kafka log segments to S3
  • enables Event-Driven Ingestion — long-retention event streams without broker scaling
  • used_by Debezium — CDC events benefit from extended retention on S3
  • relates_to Tiered Storage — Kafka-specific instance of the tiered storage pattern

Definition

What it is

A Kafka feature (KIP-405) that offloads older log segments from local broker disks to S3-compatible object storage, enabling virtually unlimited retention without scaling broker storage.

Why it exists

Kafka brokers traditionally store all retained data on local disk, forcing a tradeoff between retention period and disk cost. Tiered storage breaks this constraint by moving cold segments to S3, keeping only hot data on fast local storage.

Primary use cases

Long-term Kafka log retention on S3, cost-effective event replay from object storage, decoupling Kafka retention from broker disk capacity.

Connections 10

Outbound 7
depends_on1
alternative_to2
Inbound 3

Resources 3