Flink CDC
Apache Flink connectors for reading database change logs (MySQL binlog, PostgreSQL WAL) and streaming them directly into lakehouse formats on S3 without an intermediate message broker.
Summary
Apache Flink connectors for reading database change logs (MySQL binlog, PostgreSQL WAL) and streaming them directly into lakehouse formats on S3 without an intermediate message broker.
Flink CDC removes Kafka from the CDC pipeline. Instead of Database → Debezium → Kafka → Flink → S3, the architecture becomes Database → Flink CDC → S3. This reduces latency, operational complexity, and infrastructure costs for database-to-lakehouse replication.
- Eliminating Kafka also eliminates its replay buffer. If the Flink job fails, replay must come from the database logs, which may have limited retention.
- Memory usage can be significant under high-throughput workloads. Capacity planning for Flink CDC is critical.
depends_onApache Flink — runs as Flink connectorsenablesApache Paimon, Apache Iceberg, Apache Hudi — writes CDC data directly to lakehouse formatsscoped_toTable Formats — ingestion framework for S3-based table formats
Definition
A set of Apache Flink connectors that read database change logs (MySQL binlog, PostgreSQL WAL, MongoDB oplog) and stream them directly into lakehouse table formats on S3, without requiring an intermediate message broker.
Traditional CDC pipelines require Kafka or a similar message queue between the source database and the lake. Flink CDC eliminates this intermediate layer by reading change logs directly and writing to Iceberg, Paimon, or Hudi on S3, reducing operational complexity and latency.
Database-to-lakehouse replication without Kafka, real-time data mirroring from operational databases to S3, streaming CDC ingestion into Iceberg or Paimon tables.
Recent developments
- Sub-second end-to-end latency, no-Kafka pipelines positioned as the primary CDC option. Per the Flink CDC official docs, the current release ships incremental snapshot algorithm (no source-database lock), schema evolution with automatic downstream table creation and DDL application, full streaming pipeline with sub-second end-to-end latency, and SQL-shaped transformations (projection, filtering, computed columns). The "skip the Kafka hop" framing is now the canonical reason to pick Flink CDC over Debezium + Kafka Connect for Iceberg/Paimon/Hudi sinks.
- Operational tradeoff documented honestly in CDC tooling surveys. Per RisingWave's CDC tools comparison (April 2026), Flink CDC's strengths (sub-second latency, no Kafka required, full Java/SQL transformations) come with real costs: JVM expertise required, checkpoint configuration, RocksDB state backend tuning, JobManager / TaskManager cluster ops, and a SQL surface area narrower than RisingWave's. Decision framing: pick Flink CDC when transformation capability and operational depth justify the JVM overhead; pick a SQL-shaped streaming database when the workload fits the SQL surface area cleanly.
Connections 8
Outbound 6
Inbound 2
used_by1depends_on1Resources 3
Official Flink CDC documentation covering supported databases, connector configuration, and pipeline setup.
Source repository with connector implementations, version compatibility matrix, and migration guides.
Engineering guide to CDC strategies for Iceberg covering Flink CDC as a Kafka-free alternative.