lakeFS
A Git-like version control system for data lakes on S3, providing branching, committing, merging, and rollback for datasets stored in object storage.
Summary
A Git-like version control system for data lakes on S3, providing branching, committing, merging, and rollback for datasets stored in object storage.
lakeFS adds software engineering workflows (branch, test, merge) to S3 data management. Teams can experiment on data branches without affecting production, validate changes before publishing, and roll back to any previous state — all without copying data.
- lakeFS is not a new storage system. It is a metadata layer on top of existing S3 storage. Data stays in S3; lakeFS manages pointer references and branches.
- lakeFS exposes an S3-compatible gateway, but it is not a general-purpose S3 server. It manages versioned data lake access, not arbitrary object storage.
implementsS3 API (gateway) — S3-compatible access to branched dataenablesWrite-Audit-Publish — branch-based data quality gatingscoped_toData Versioning — Git-like version control for datasolvesSchema Evolution — test schema changes on branches before merging
Definition
An open-source platform providing Git-like version control for data lakes on S3 — branching, committing, merging, and reverting datasets as first-class operations on object storage.
Data pipelines produce errors that need to be rolled back, experiments need isolated branches, and production data needs quality gates before promotion. lakeFS brings software engineering workflows (branch, test, merge) to S3-stored data.
Data CI/CD pipelines, write-audit-publish workflows, experiment isolation, data rollback and recovery.
Recent developments
- Latest release: v1.82.0 (GA June 16, 2026). Adds a
lakefs/<version>User-Agent to S3-gateway requests (so lakeFS traffic is identifiable in object-store logs), alakectl local commit -y/--yesflag for automation, and ARM64 stability fixes — and ships a critical PostgreSQL-driver fix (CVE-2026-33816, via a pgx/v5 upgrade), making it a recommended upgrade. Per treeverse/lakeFS releases. - Data Lake Mount — versioned object storage as a local filesystem (June 2026). lakeFS Mount exposes S3, Azure Data Lake Storage, and GCS as a mounted filesystem with branch/commit semantics, so training jobs and notebooks read versioned data through ordinary file paths instead of an S3 client — directly addressing the POSIX gap for ML workloads on object storage. Per the lakeFS blog.
- Strategic pivot to the "trusted data layer for agentic AI" (Q2 2026). lakeFS's mid-2026 messaging (CEO Einat Orr; "Meet lakeFS for Agentic AI," June 10) reframes data version control as the substrate autonomous agents need for reliable enterprise workflows — reproducible, auditable, branch-isolated data an agent can act on without corrupting production. It's the data-versioning category's bid to be load-bearing in the agent stack, adjacent to AI Memory Infrastructure. Per the lakeFS blog.
- lakeFS shipped a standards-compliant Iceberg REST Catalog (lakeFS Enterprise). A fully Iceberg-REST-spec-compliant catalog that brings Git-style branch/commit/merge to Apache Iceberg tables and works out-of-the-box with Spark, Trino, Flink, and PyIceberg — no proprietary format or vendor lock-in. It moves lakeFS from an S3-gateway versioning layer into a first-class participant in the Iceberg REST catalog convergence alongside Apache Polaris, Unity Catalog, and Nessie. Per lakeFS Iceberg REST Catalog: Version Control at Scale.
- lakeFS acquired the DVC open-source project from Iterative.ai (November 18, 2025). DVC (Data Version Control) — the Git-for-ML-data tool for data scientists on smaller datasets — joins lakeFS's enterprise-scale object-storage versioning; DVC stays 100% open-source under lakeFS stewardship while lakeFS consolidates the data-version-control category from individual ML projects up to Fortune-100 AI infrastructure. Per lakeFS Acquires DVC, Uniting Data Version Control Pioneers (PR Newswire).
- Relationship to Iceberg-native branching tightening. Apache Iceberg v3's native branching primitives close part of the gap that lakeFS originally filled. The 2026 framing for new deployments: pick lakeFS when you need branching across heterogeneous storage and engines; pick Iceberg-native branching when you're committed to Iceberg-only workflows. The two are increasingly complementary rather than direct substitutes.
Connections 7
Outbound 6
scoped_to2implements2enables1solves1Inbound 1
depends_on1Resources 3
Official lakeFS documentation for Git-like version control over S3 data lakes with branching, commits, and merges.
lakeFS source repository with the Go server, S3 gateway, and client SDKs.
lakeFS engineering blog covering data versioning patterns, CI/CD for data, and lakehouse best practices.