Lakehouse
The convergence of data lake storage (raw files on object storage) with data warehouse capabilities — ACID transactions, schema enforcement, SQL access, time-travel.
Summary
The convergence of data lake storage (raw files on object storage) with data warehouse capabilities — ACID transactions, schema enforcement, SQL access, time-travel.
Lakehouse sits between raw object storage and business analytics. It is the architectural layer where table formats (Iceberg, Delta, Hudi) add structure to S3 data, enabling SQL engines to query it reliably.
- A lakehouse is not just "a data lake with SQL." The key differentiator is transactional guarantees — ACID, schema evolution, snapshot isolation — provided by table format specs.
- Lakehouse does not eliminate ETL. It eliminates the second copy of data in a separate warehouse, but data still needs transformation.
scoped_toObject Storage — the lakehouse stores all data on object storage- Lakehouse Architecture
scoped_toLakehouse — the concrete architectural pattern - Apache Iceberg, Delta Lake, Apache Hudi
scoped_toLakehouse — table format technologies - Medallion Architecture
scoped_toLakehouse — a data quality pattern within lakehouses - Iceberg Table Spec, Delta Lake Protocol, Apache Hudi Spec
scoped_toLakehouse — the specifications that define table semantics
Definition
The convergence of data lake storage (raw files on object storage) with data warehouse capabilities (ACID transactions, schema enforcement, SQL access, time-travel).
Data lakes offered cheap, scalable storage but lacked reliability guarantees. Data warehouses offered guarantees but were expensive and siloed. The lakehouse concept unifies both on a single object storage layer.
Recent developments
- Vertical reference architectures published — UK financial services as a leading example. Per the 2026 UK Financial Services Lakehouse Reference Architecture, regulated industries are now publishing vertical-specific reference architectures that codify Iceberg/Delta + catalog + governance choices for compliance contexts. The pattern is generalizable: large regulated verticals (financial services, healthcare, public sector) increasingly converge on a small number of lakehouse reference shapes rather than each organization rolling its own.
- Lakehouse architecture as an interview-required topic. Per DataDriven's analysis of 1,042 verified data-engineering interview rounds, lakehouse-architecture questions covering Delta vs Iceberg vs Hudi vs Paimon now appear in production-grade interviews at unprecedented frequency. The practical implication: organizations evaluating engineering candidates now treat lakehouse fluency as a baseline competency, which in turn accelerates the rate at which lakehouse decisions made in 2024–2025 lock in for 2026+.
Connections 37
Outbound 1
scoped_to1Inbound 36
scoped_to36Resources 3
"Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics" (Armbrust et al., CIDR 2021) is the canonical academic paper defining the lakehouse paradigm.
Databricks' glossary entry distills the lakehouse concept into an accessible overview with diagrams comparing it to data lakes and warehouses.
Databricks' well-architected data lakehouse documentation covering architectural pillars for production implementations.