Event-Driven Ingestion
An architecture pattern where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on a fixed schedule.
Summary
An architecture pattern where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on a fixed schedule.
Event-driven ingestion sits between data sources and the lakehouse write layer. Instead of polling for new data, pipelines react to events — an S3 PutObject notification triggers a Lambda that registers the file in an Iceberg table, or a Kafka consumer writes micro-batches to Delta tables on new message arrival.
- Event-driven does not mean real-time. Event processing latency depends on the event transport (S3 Event Notifications have seconds of delay), processing time, and batching strategy.
- S3 Event Notifications can be lost or delivered out of order. Pipelines must be idempotent and handle duplicate or missing notifications gracefully.
- Event-driven architectures trade scheduling simplicity for operational complexity. Dead letter queues, retry policies, and event ordering must be explicitly designed.
scoped_toS3, Lakehouse — event-triggered writes to S3-based tablesdepends_onKafka Tiered Storage, Redpanda — event transport layerrelates_toBatch vs Streaming — event-driven is the streaming alternative to scheduled batchenablesCDC into Lakehouse — CDC events trigger lakehouse writes
Definition
An architecture where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, webhook callbacks) rather than running on fixed schedules, enabling reactive data pipelines.
Scheduled batch ingestion wastes compute when no new data arrives and introduces unnecessary latency when data arrives between schedules. Event-driven ingestion processes data as it appears, optimizing both cost and freshness.
S3 event notification-triggered ETL, Kafka-driven lakehouse ingestion, reactive data pipelines for irregular data arrival patterns.
Recent developments
- S3 → EventBridge → Lambda is the 2026 canonical event-driven ingestion shape. S3 notifications route through EventBridge instead of (or in addition to) direct Lambda — EventBridge adds powerful filtering + routing + fan-out + DLQs. The "S3 directly invokes Lambda" pattern is now considered legacy. Per CloudThat — Building Real-Time Amazon S3 to AWS Lambda Triggers Using EventBridge.
- Production serverless ETL stack: S3 + EventBridge + Lambda + Step Functions + Glue. Full reference architecture: S3 PUT → EventBridge filter → Lambda processor → Step Functions orchestrator (validation, quarantine, retry) → Glue (schema drift, large transforms). Covers event-driven ingestion end-to-end + replay. Per DEV — Serverless ETL/ELT Architecture with S3, EventBridge, Lambda, Step Functions, Glue.
- EventBridge 256 KB payload limit → Claim-Check pattern as the standard workaround. When event payloads exceed the EventBridge limit, store the actual blob in S3 + pass an S3 reference through the event bus. The Claim-Check pattern is now the documented escape hatch. Per Medium — The Secret Life of AWS: Event-Driven Architecture (March 2026).
- EventBridge Schema Registry auto-discovers + versions event structures. Auto-discovered event schemas with version tracking — downstream Lambda consumers get IDE auto-complete + type generation. Closes the "I don't know what fields this event has" gap that historically slowed event-driven adoption. Per CyberPanel — How AWS EventBridge Builds Event-Driven Architecture 2026.
- Native DLQ support guarantees zero data loss when targets are offline. EventBridge ships native Dead Letter Queue support — if a target service is offline, events are captured + retained for later replay. Removes the "event-driven means accepting some loss" tradeoff. Per CyberPanel — EventBridge Event-Driven Architecture 2026.
- Bedrock Knowledge Bases auto-sync pattern uses this stack end-to-end. AWS published a reference implementation: S3 PUT → EventBridge → Lambda → Bedrock KB ingestion API → embeddings refresh. The event-driven-ingestion → RAG-corpus refresh shape is now a documented AWS playbook. Per AWS ML Blog — Build and Deploy Automatic Sync Solution for Amazon Bedrock Knowledge Bases.
Connections 8
Outbound 6
depends_on1solves1constrained_by1enables1Inbound 2
enables2Resources 3
S3 Event Notifications documentation for triggering Lambda, SQS, or SNS workflows when objects are created, deleted, or modified.
AWS Lambda S3 trigger documentation for building serverless event-driven ingestion pipelines from S3 to downstream systems.
EventBridge S3 integration guide for routing S3 events to multiple targets with filtering, transformation, and fan-out patterns.