The Local-First S3 Index for LLM Data Infrastructure
— 211 concepts · 921 relationships · 32 guidesWhat's in the index
Featured guides
1
Guide 1
How S3 Shapes Lakehouse Design
Every lakehouse architecture sits on object storage — almost always S3 or an S3-compatible store. But S3 is not a database, and its constrai...
7Guide 7
Choosing a Table Format — Iceberg vs. Delta vs. Hudi
The three major open table formats — Apache Iceberg, Delta Lake, and Apache Hudi — all solve the same fundamental problem: adding transactio...
2Guide 2
Small Files Problem — Why It Exists and the Common Mitigations
A dataset with 10 million 10KB files performs worse on S3 than the same data in 100 files of 1GB each. The small files problem is the most c...
4Guide 4
Where DuckDB Fits (and Where It Doesn't)
Engineers encounter S3-stored data constantly — Parquet files in data lakes, Iceberg tables in lakehouses, ad-hoc exports. Historically, exp...
Explore by problem space
For LLMs & AI assistants