Technology

Delta Lake

Summary

What it is

An open table format and storage layer providing ACID transactions, scalable metadata, and schema enforcement on data stored in object storage. Originally developed at Databricks.

Where it fits

Delta Lake is the table format native to the Databricks ecosystem. It competes with Iceberg and Hudi but has the strongest integration with Spark-based platforms. On S3, Delta Lake requires external coordination for atomic commits due to the lack of atomic rename.

Misconceptions / Traps

  • Delta Lake on S3 requires a DynamoDB-based log store or equivalent for multi-writer safety. Without it, concurrent writes can corrupt the transaction log.
  • "Delta" and "Databricks" are closely associated, but Delta is open-source. However, some advanced features (liquid clustering, predictive optimization) are Databricks-proprietary.

Key Connections

  • implements Lakehouse Architecture — provides ACID on data lakes
  • depends_on Delta Lake Protocol, Apache Parquet — protocol spec and data format
  • solves Schema Evolution — schema enforcement with evolution support
  • constrained_by Vendor Lock-In (Databricks ecosystem affinity), Lack of Atomic Rename (S3 limitation)
  • scoped_to Table Formats, Lakehouse

Definition

What it is

An open table format and storage layer that brings ACID transactions, scalable metadata handling, and schema enforcement to data stored on object storage.

Why it exists

To enable reliable data pipelines on data lakes by providing transaction guarantees that raw file storage lacks. Originally developed at Databricks to address data quality and consistency problems in Spark-based pipelines.

Primary use cases

ACID-compliant data lakes, streaming and batch unification, audit-ready data pipelines, time-travel queries.

Relationships

Resources