Databricks
A unified data + AI platform built on Apache Spark and Delta Lake, with a managed lakehouse covering data engineering, SQL analytics, ML/AI, and (as of 2026) operational application data via **Lakebase**. Originated the lakehouse architecture pattern.
Summary
A unified data + AI platform built on Apache Spark and Delta Lake, with a managed lakehouse covering data engineering, SQL analytics, ML/AI, and (as of 2026) operational application data via **Lakebase**. Originated the lakehouse architecture pattern.
Databricks sits as the commercial lakehouse-platform layer above S3-compatible object storage. The platform bundles cluster management, Delta Lake transactions, Unity Catalog governance, and a managed runtime, so operators can treat S3 as the system of record without managing the substrate. With Lakebase (May 2026 GA), the platform also serves operational app data colocated with the analytical lakehouse — a single product class that erases the operational/analytical wall.
- "Open formats = portable" is partially true. Delta Lake and Iceberg are open, but Photon engine, Unity Catalog deep integration, and Lakebase are platform-stickiness layers. Migration cost is real even with open table formats.
- Lakebase is not a relational DB replacement; it's an OLTP-shaped surface layered over the analytical lakehouse. Don't expect classical RDBMS features (referential integrity enforced by FKs, complex stored procedures, etc.).
- Databricks pricing is consumption-based (DBUs); poorly-tuned workloads or always-on clusters can cost more than self-managed Spark + S3 if not carefully managed.
scoped_toLakehouse, S3implementsLakehouse Architecture — coined the patterndepends_onApache Spark, Delta Lake, Unity Catalogcompetes_withSnowflake — the dominant platform-war axis in 2026
Definition
A unified data + AI platform built on Apache Spark and Delta Lake, with a managed lakehouse architecture spanning data engineering, SQL analytics, ML/AI, and (as of 2026) operational application data via **Lakebase**. Originated the lakehouse pattern. The platform is deeply integrated with cloud object stores (AWS S3, Azure Blob, GCS) — every Databricks workspace fundamentally reads/writes data on S3-compatible buckets.
Enterprises wanted SQL warehouse ergonomics with data-lake economics — running Spark and Hive on raw S3 was operationally heavy and lacked transaction guarantees. Databricks bundled cluster management, Delta Lake transactions, **Unity Catalog** governance, and a managed runtime so operators could treat S3 as the system of record without managing the substrate.
Production lakehouse pipelines, large-scale ETL with Delta Lake, ML model training on S3-resident data, SQL warehouse workloads with cost-efficient compute, real-time streaming with Spark Structured Streaming, and (with Lakebase) operational application backends colocated with the analytical lakehouse.
Recent developments
- Databricks Lakebase — operational + analytical convergence (May 2026 GA). Lakebase launched as a new product class explicitly eliminating the boundary between operational and analytical systems — a single engine that can serve OLTP-shaped reads/writes for application data while letting the same data flow into the analytical lakehouse with no separate ETL. Production-validated agent prototypes are reading operational data + querying lakehouse + running vector search in a single transaction. Early customer signals: Warner Music Group, Hafnia, air transport / logistics companies running production workloads on Lakebase.
- Business momentum dwarfs the format wars. Databricks reports $4B+ ARR growing 65% YoY, raised at a $34B valuation. Lakebase adoption is growing 2× faster than data warehousing — signalling the operational/analytical convergence pitch is landing with enterprise buyers. Combined with the Tabular acquisition (June 2025), Databricks now has commercial influence over both Delta Lake AND Iceberg, and is publicly arguing for format convergence rather than competition.
- Unity Catalog open-sourcing strategy. Databricks open-sourced Unity Catalog in June 2024 with Iceberg REST Catalog API support — directly answering Snowflake's Apache Polaris move. The result: both major lakehouse vendors now ship open catalog implementations, and the catalog layer is becoming the most contested architectural surface (since whoever controls catalog controls table-format choice).
Connections 10
Outbound 10
implements1depends_on4competes_with1Resources 3
Official Databricks platform site — overview of the unified data + AI platform, Lakebase, Delta Lake, Unity Catalog, Mosaic AI offerings.
Official Databricks documentation covering workspace administration, runtime versions, Delta Lake operations, Unity Catalog governance, and the Lakebase OLTP-shaped surface.
Lakebase-specific community articles including the launch announcement and operational/analytical convergence patterns covered in 2026.