Architecture

Event-Driven Ingestion

An architecture pattern where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on a fixed schedule.

8 connections 3 resources

Summary

What it is

An architecture pattern where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on a fixed schedule.

Where it fits

Event-driven ingestion sits between data sources and the lakehouse write layer. Instead of polling for new data, pipelines react to events — an S3 PutObject notification triggers a Lambda that registers the file in an Iceberg table, or a Kafka consumer writes micro-batches to Delta tables on new message arrival.

Misconceptions / Traps
  • Event-driven does not mean real-time. Event processing latency depends on the event transport (S3 Event Notifications have seconds of delay), processing time, and batching strategy.
  • S3 Event Notifications can be lost or delivered out of order. Pipelines must be idempotent and handle duplicate or missing notifications gracefully.
  • Event-driven architectures trade scheduling simplicity for operational complexity. Dead letter queues, retry policies, and event ordering must be explicitly designed.
Key Connections
  • scoped_to S3, Lakehouse — event-triggered writes to S3-based tables
  • depends_on Kafka Tiered Storage, Redpanda — event transport layer
  • relates_to Batch vs Streaming — event-driven is the streaming alternative to scheduled batch
  • enables CDC into Lakehouse — CDC events trigger lakehouse writes

Definition

What it is

An architecture where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on fixed schedules, enabling reactive data pipelines.

Why it exists

Scheduled batch ingestion wastes compute when no new data arrives and introduces unnecessary latency when data arrives between schedules. Event-driven ingestion processes data as it appears, optimizing both cost and freshness.

Primary use cases

S3 event notification-triggered ETL, Kafka-driven lakehouse ingestion, reactive data pipelines for irregular data arrival patterns.

Connections 8

Outbound 6
Inbound 2

Resources 3