lakeFS
A Git-like version control system for data lakes on S3, providing branching, committing, merging, and rollback for datasets stored in object storage.
Summary
A Git-like version control system for data lakes on S3, providing branching, committing, merging, and rollback for datasets stored in object storage.
lakeFS adds software engineering workflows (branch, test, merge) to S3 data management. Teams can experiment on data branches without affecting production, validate changes before publishing, and roll back to any previous state — all without copying data.
- lakeFS is not a new storage system. It is a metadata layer on top of existing S3 storage. Data stays in S3; lakeFS manages pointer references and branches.
- lakeFS exposes an S3-compatible gateway, but it is not a general-purpose S3 server. It manages versioned data lake access, not arbitrary object storage.
implementsS3 API (gateway) — S3-compatible access to branched dataenablesWrite-Audit-Publish — branch-based data quality gatingscoped_toData Versioning — Git-like version control for datasolvesSchema Evolution — test schema changes on branches before merging
Definition
An open-source platform providing Git-like version control for data lakes on S3 — branching, committing, merging, and reverting datasets as first-class operations on object storage.
Data pipelines produce errors that need to be rolled back, experiments need isolated branches, and production data needs quality gates before promotion. lakeFS brings software engineering workflows (branch, test, merge) to S3-stored data.
Data CI/CD pipelines, write-audit-publish workflows, experiment isolation, data rollback and recovery.
Connections 6
Outbound 5
Inbound 1
depends_on1Resources 3
Official lakeFS documentation for Git-like version control over S3 data lakes with branching, commits, and merges.
lakeFS source repository with the Go server, S3 gateway, and client SDKs.
lakeFS engineering blog covering data versioning patterns, CI/CD for data, and lakehouse best practices.