Architecture

Event-Driven Ingestion

An architecture pattern where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on a fixed schedule.

8 connections 3 resources

Summary

What it is

An architecture pattern where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on a fixed schedule.

Where it fits

Event-driven ingestion sits between data sources and the lakehouse write layer. Instead of polling for new data, pipelines react to events — an S3 PutObject notification triggers a Lambda that registers the file in an Iceberg table, or a Kafka consumer writes micro-batches to Delta tables on new message arrival.

Misconceptions / Traps

Event-driven does not mean real-time. Event processing latency depends on the event transport (S3 Event Notifications have seconds of delay), processing time, and batching strategy.
S3 Event Notifications can be lost or delivered out of order. Pipelines must be idempotent and handle duplicate or missing notifications gracefully.
Event-driven architectures trade scheduling simplicity for operational complexity. Dead letter queues, retry policies, and event ordering must be explicitly designed.

Key Connections

scoped_to S3, Lakehouse — event-triggered writes to S3-based tables
depends_on Kafka Tiered Storage, Redpanda — event transport layer
relates_to Batch vs Streaming — event-driven is the streaming alternative to scheduled batch
enables CDC into Lakehouse — CDC events trigger lakehouse writes

Definition

What it is

An architecture where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on fixed schedules, enabling reactive data pipelines.

Why it exists

Scheduled batch ingestion wastes compute when no new data arrives and introduces unnecessary latency when data arrives between schedules. Event-driven ingestion processes data as it appears, optimizing both cost and freshness.

Primary use cases

S3 event notification-triggered ETL, Kafka-driven lakehouse ingestion, reactive data pipelines for irregular data arrival patterns.

Recent developments

Latest signals

S3 → EventBridge → Lambda is the 2026 canonical event-driven ingestion shape. S3 notifications route through EventBridge instead of (or in addition to) direct Lambda — EventBridge adds powerful filtering + routing + fan-out + DLQs. The "S3 directly invokes Lambda" pattern is now considered legacy. Per CloudThat — Building Real-Time Amazon S3 to AWS Lambda Triggers Using EventBridge.
Production serverless ETL stack: S3 + EventBridge + Lambda + Step Functions + Glue. Full reference architecture: S3 PUT → EventBridge filter → Lambda processor → Step Functions orchestrator (validation, quarantine, retry) → Glue (schema drift, large transforms). Covers event-driven ingestion end-to-end + replay. Per DEV — Serverless ETL/ELT Architecture with S3, EventBridge, Lambda, Step Functions, Glue.
EventBridge 256 KB payload limit → Claim-Check pattern as the standard workaround. When event payloads exceed the EventBridge limit, store the actual blob in S3 + pass an S3 reference through the event bus. The Claim-Check pattern is now the documented escape hatch. Per Medium — The Secret Life of AWS: Event-Driven Architecture (March 2026).
EventBridge Schema Registry auto-discovers + versions event structures. Auto-discovered event schemas with version tracking — downstream Lambda consumers get IDE auto-complete + type generation. Closes the "I don't know what fields this event has" gap that historically slowed event-driven adoption. Per CyberPanel — How AWS EventBridge Builds Event-Driven Architecture 2026.
Native DLQ support guarantees zero data loss when targets are offline. EventBridge ships native Dead Letter Queue support — if a target service is offline, events are captured + retained for later replay. Removes the "event-driven means accepting some loss" tradeoff. Per CyberPanel — EventBridge Event-Driven Architecture 2026.
Bedrock Knowledge Bases auto-sync pattern uses this stack end-to-end. AWS published a reference implementation: S3 PUT → EventBridge → Lambda → Bedrock KB ingestion API → embeddings refresh. The event-driven-ingestion → RAG-corpus refresh shape is now a documented AWS playbook. Per AWS ML Blog — Build and Deploy Automatic Sync Solution for Amazon Bedrock Knowledge Bases.