Architecture
Repeatable design patterns that combine multiple technologies to solve structural problems.
8 nodesLakehouse Architecture
ArchitectureA unified architecture combining data lake storage (files on S3) with warehouse capabilities (ACID, schema enforcement, SQL access) by using a table f...
Medallion Architecture
ArchitectureA layered data quality pattern — Bronze (raw), Silver (cleansed), Gold (business-ready) — with each layer stored on object storage.
Separation of Storage and Compute
ArchitectureThe design pattern of keeping data in S3 while running independent, elastically scaled compute engines against it.
Hybrid S3 + Vector Index
ArchitectureA pattern that stores raw data on S3 and maintains a vector index over embeddings that points back to S3 objects.
Offline Embedding Pipeline
ArchitectureA batch pattern where embeddings are generated from S3-stored data on a schedule, with resulting vectors written back to object storage or a vector in...
Local Inference Stack
ArchitectureA pattern of running ML/LLM models on local hardware against data stored in or pulled from S3, avoiding cloud-based inference APIs.
Write-Audit-Publish
ArchitectureA data quality pattern where data lands in a raw S3 zone, undergoes validation, and is promoted to a curated zone only after passing audits.
Tiered Storage
ArchitectureMoving data between hot, warm, and cold storage tiers based on access frequency. S3 itself offers tiering (Standard, Infrequent Access, Glacier).