DuckLake
A lakehouse metadata format that stores table metadata in an embedded SQL database (DuckDB) instead of file-based manifests on S3. Emerging project from the DuckDB team.
Summary
A lakehouse metadata format that stores table metadata in an embedded SQL database (DuckDB) instead of file-based manifests on S3. Emerging project from the DuckDB team.
DuckLake challenges the Iceberg/Delta approach of storing metadata as JSON and Avro files on S3. By placing metadata in a SQL database, it eliminates the metadata file listing and parsing overhead that plagues large Iceberg tables — while keeping data files in Parquet on S3. It is the natural extension of DuckDB's "zero-infrastructure" philosophy to the lakehouse metadata layer.
- DuckLake is early-stage (2025). It is not a production-ready replacement for Iceberg or Delta Lake. Evaluate for experimentation and single-node workflows, not mission-critical multi-engine environments.
- The metadata database (DuckDB, PostgreSQL, MySQL) becomes a stateful dependency. This partially trades the "no server needed" benefit of file-based table formats for a database dependency.
- Multi-engine support is limited. DuckLake is tightly coupled to DuckDB today — unlike Iceberg, which works across Spark, Trino, Flink, and others.
depends_onDuckDB — uses DuckDB as the embedded metadata enginealternative_toApache Iceberg — SQL-based metadata vs file-based manifestssolvesMetadata Overhead at Scale — eliminates file-based metadata listing overheadsolvesRequest Amplification — metadata queries replace S3 LIST and GET operations
Definition
An emerging open table format that stores lakehouse metadata in an embedded SQL database (DuckDB) rather than the file-based manifests used by Iceberg, Delta, and Hudi. Provides instant commit cycles by avoiding S3 round-trips for metadata operations.
File-based table formats store metadata as Parquet/JSON/Avro manifests on S3. Every commit requires multiple PUT operations and every query plan requires multiple GET operations against these manifests. DuckLake replaces this with a local SQL database, eliminating the metadata I/O bottleneck entirely.
Low-latency lakehouse metadata operations, interactive data exploration without metadata scan overhead, single-node lakehouse workflows where DuckDB is the primary engine.
Connections 5
Outbound 5
scoped_to1depends_on1alternative_to1Resources 2
Announcement post explaining DuckLake's SQL-first metadata approach as an alternative to file-based catalogs like Iceberg's REST catalog.
Source code and specification for the DuckDB-native lakehouse format that stores catalog metadata in a database instead of manifest files on S3.