Pain Point

Legacy Ingestion Bottlenecks

Older ETL systems designed for HDFS or traditional databases that cannot efficiently write to modern S3-based lakehouse architectures.

13 connections 3 resources

Summary

What it is

Older ETL systems designed for HDFS or traditional databases that cannot efficiently write to modern S3-based lakehouse architectures.

Where it fits

This pain point is the migration friction between the old world (Hadoop, RDBMS, batch ETL) and the new world (S3 lakehouse). It slows adoption and forces dual-system operation during transitions.

Misconceptions / Traps
  • "Lift and shift" rarely works. Legacy ETL tools produce formats, file sizes, and write patterns incompatible with lakehouse best practices.
  • CDC (Change Data Capture) is the modern replacement for batch ETL, but it introduces its own complexity (Debezium, Kafka, schema registries).
Key Connections
  • Apache Ozone solves Legacy Ingestion Bottlenecks — HDFS migration path
  • Apache Hudi solves Legacy Ingestion Bottlenecks — incremental ingestion primitives
  • Medallion Architecture constrained_by Legacy Ingestion Bottlenecks — Bronze layer receives legacy data
  • scoped_to Data Lake, S3

Definition

What it is

Older ETL systems and ingestion pipelines that were designed for HDFS or traditional databases and cannot efficiently write to modern S3-based lakehouse architectures.

Connections 13

Outbound 2
scoped_to2
Inbound 11

Resources 3