Delta Lake
An open table format and storage layer providing ACID transactions, scalable metadata, and schema enforcement on data stored in object storage. Originally developed at Databricks.
Summary
An open table format and storage layer providing ACID transactions, scalable metadata, and schema enforcement on data stored in object storage. Originally developed at Databricks.
Delta Lake is the table format native to the Databricks ecosystem. It competes with Iceberg and Hudi but has the strongest integration with Spark-based platforms. On S3, Delta Lake requires external coordination for atomic commits due to the lack of atomic rename.
- Delta Lake on S3 requires a DynamoDB-based log store or equivalent for multi-writer safety. Without it, concurrent writes can corrupt the transaction log.
- "Delta" and "Databricks" are closely associated, but Delta is open-source. However, some advanced features (liquid clustering, predictive optimization) are Databricks-proprietary.
implementsLakehouse Architecture — provides ACID on data lakesdepends_onDelta Lake Protocol, Apache Parquet — protocol spec and data formatsolvesSchema Evolution — schema enforcement with evolution supportconstrained_byVendor Lock-In (Databricks ecosystem affinity), Lack of Atomic Rename (S3 limitation)scoped_toTable Formats, Lakehouse
Definition
An open table format and storage layer that brings ACID transactions, scalable metadata handling, and schema enforcement to data stored on object storage.
To enable reliable data pipelines on data lakes by providing transaction guarantees that raw file storage lacks. Originally developed at Databricks to address data quality and consistency problems in Spark-based pipelines.
ACID-compliant data lakes, streaming and batch unification, audit-ready data pipelines, time-travel queries.
Connections 12
Outbound 8
scoped_to2implements1depends_on2solves1constrained_by2Inbound 4
depends_on1Resources 4
Official Delta Lake documentation covering table protocol, API usage with Spark/Flink/Trino, and storage configuration including S3.
Primary Delta Lake open-source repository maintained by Databricks and the community, including the protocol spec and Spark connector.
The Delta Lake protocol specification defines the transaction log format and storage requirements critical for S3-based Delta tables.
Delta Lake's storage configuration documentation covers S3 multi-cluster writes, DynamoDB-based log store, and credentials setup.