Technology

lakeFS

A Git-like version control system for data lakes on S3, providing branching, committing, merging, and rollback for datasets stored in object storage.

6 connections 3 resources

Summary

What it is

A Git-like version control system for data lakes on S3, providing branching, committing, merging, and rollback for datasets stored in object storage.

Where it fits

lakeFS adds software engineering workflows (branch, test, merge) to S3 data management. Teams can experiment on data branches without affecting production, validate changes before publishing, and roll back to any previous state — all without copying data.

Misconceptions / Traps
  • lakeFS is not a new storage system. It is a metadata layer on top of existing S3 storage. Data stays in S3; lakeFS manages pointer references and branches.
  • lakeFS exposes an S3-compatible gateway, but it is not a general-purpose S3 server. It manages versioned data lake access, not arbitrary object storage.
Key Connections
  • implements S3 API (gateway) — S3-compatible access to branched data
  • enables Write-Audit-Publish — branch-based data quality gating
  • scoped_to Data Versioning — Git-like version control for data
  • solves Schema Evolution — test schema changes on branches before merging

Definition

What it is

An open-source platform providing Git-like version control for data lakes on S3 — branching, committing, merging, and reverting datasets as first-class operations on object storage.

Why it exists

Data pipelines produce errors that need to be rolled back, experiments need isolated branches, and production data needs quality gates before promotion. lakeFS brings software engineering workflows (branch, test, merge) to S3-stored data.

Primary use cases

Data CI/CD pipelines, write-audit-publish workflows, experiment isolation, data rollback and recovery.

Connections 6

Outbound 5
Inbound 1

Resources 3