Pain Point

Legacy Ingestion Bottlenecks

Summary

What it is

Older ETL systems designed for HDFS or traditional databases that cannot efficiently write to modern S3-based lakehouse architectures.

Where it fits

This pain point is the migration friction between the old world (Hadoop, RDBMS, batch ETL) and the new world (S3 lakehouse). It slows adoption and forces dual-system operation during transitions.

Misconceptions / Traps

  • "Lift and shift" rarely works. Legacy ETL tools produce formats, file sizes, and write patterns incompatible with lakehouse best practices.
  • CDC (Change Data Capture) is the modern replacement for batch ETL, but it introduces its own complexity (Debezium, Kafka, schema registries).

Key Connections

  • Apache Ozone solves Legacy Ingestion Bottlenecks — HDFS migration path
  • Apache Hudi solves Legacy Ingestion Bottlenecks — incremental ingestion primitives
  • Medallion Architecture constrained_by Legacy Ingestion Bottlenecks — Bronze layer receives legacy data
  • scoped_to Data Lake, S3

Definition

What it is

Older ETL systems and ingestion pipelines that were designed for HDFS or traditional databases and cannot efficiently write to modern S3-based lakehouse architectures.

Relationships

Outbound Relationships

scoped_to

Inbound Relationships

Resources