Architecture

Architecture

Repeatable design patterns that combine multiple technologies to solve structural problems.

52 nodes
Lakehouse Architecture Architecture

A unified architecture combining data lake storage (files on S3) with warehouse capabilities (ACID, schema enforcement, SQL access…

33 3
Medallion Architecture Architecture

A layered data quality pattern — Bronze (raw), Silver (cleansed), Gold (business-ready) — with each layer stored on object storage…

8 3
Separation of Storage and Compute Architecture

The design pattern of keeping data in S3 while running independent, elastically scaled compute engines against it.

9 3
Hybrid S3 + Vector Index Architecture

A pattern that stores raw data on S3 and maintains a vector index over embeddings that points back to S3 objects.

11 3
Offline Embedding Pipeline Architecture

A batch pattern where embeddings are generated from S3-stored data on a schedule, with resulting vectors written back to object st…

4 3
Local Inference Stack Architecture

A pattern of running ML/LLM models on local hardware against data stored in or pulled from S3, avoiding cloud-based inference APIs…

4 3
Write-Audit-Publish Architecture

A data quality pattern where data lands in a raw S3 zone, undergoes validation, and is promoted to a curated zone only after passi…

6 3
Tiered Storage Architecture

Moving data between hot, warm, and cold storage tiers based on access frequency. S3 itself offers tiering (Standard, Infrequent Ac…

5 3
Geo-Dispersed Erasure Coding Architecture

An erasure coding scheme that distributes data fragments and parity blocks across geographically separated sites, providing durabi…

5 3
NVMe-backed Object Tier Architecture

An architecture placing NVMe flash as a high-performance local storage tier beneath the S3 API, serving hot objects with microseco…

7 2
GPU-Direct Storage Pipeline Architecture

An architecture that streams data directly from storage devices to GPU memory, bypassing the CPU and system memory entirely. Uses …

3 3
RDMA-Accelerated Object Access Architecture

Using RDMA network transport for microsecond-level object storage access within high-performance computing clusters, bypassing ker…

4 2
Cache-Fronted Object Storage Architecture

Placing a cache layer (SSD, Alluxio, CDN, or in-memory cache) in front of S3 to serve frequently accessed objects with lower laten…

4 2
Checkpoint/Artifact Lake on Object Storage Architecture

Using S3 as the durable repository for ML model checkpoints, trained model artifacts, training logs, and experiment metadata. A ce…

3 3
Training Data Streaming from Object Storage Architecture

Streaming training data directly from S3 into GPU training loops during ML model training, avoiding the need to download entire da…

5 3
Feature/Embedding Store on Object Storage Architecture

Storing ML feature vectors and embedding tables on S3 in columnar formats (Parquet, Lance), enabling cost-effective persistence an…

5 3
Online Embedding Refresh Pipeline Architecture

A continuous pipeline that regenerates vector embeddings as source data in S3 changes, keeping vector indexes in sync with the lat…

5 2
Active-Active Multi-Site Object Replication Architecture

Bidirectional replication between two or more S3-compatible storage sites where all sites accept writes simultaneously, with confl…

5 3
Edge-to-Core Object Aggregation Architecture

A one-way replication pattern where data collected at edge S3-compatible storage nodes is continuously replicated to a central S3 …

4 2
Immutable Backup Repository on Object Storage Architecture

Using S3 Object Lock to create a tamper-proof backup vault where backup data cannot be deleted or modified until the retention per…

6 3
Ransomware-Resilient Object Backup Architecture Architecture

A defense-in-depth backup architecture combining S3 Object Lock, air-gapped replication, anomaly detection on access patterns, and…

5 2
Deletion Vector Architecture

A metadata pattern that tracks which rows in a data file have been logically deleted or updated, using a compact bitmap instead of…

5 2
LSM-tree on S3 Architecture

An architectural pattern adapting Log-Structured Merge-tree storage to object storage, where writes are batched into sorted append…

4 2
Compaction Architecture

The background maintenance operation that merges many small data files into fewer, larger files within a table format (Iceberg, De…

9 3
CDC into Lakehouse Architecture

The architecture pattern of capturing row-level changes (inserts, updates, deletes) from operational databases and applying them t…

9 3
Row / Column Security Architecture

The practice of restricting access to specific rows or columns within lakehouse tables based on user identity, role, or policy, en…

8 3
Encryption / KMS Architecture

The combination of data encryption (at rest and in transit) with key management service (KMS) integration to protect S3-stored dat…

10 3
Tenant Isolation Architecture

The set of architectural strategies for ensuring that multiple tenants (customers, business units, or environments) sharing an S3-…

8 3
RAG over Structured Data Architecture

The architecture pattern of using retrieval-augmented generation (RAG) to answer natural language questions against structured dat…

10 3
Clustering / Sort Order Architecture

The practice of physically organizing data files within a table by the values of one or more columns, so that queries filtering on…

7 3
File Sizing Strategy Architecture

The practice of deliberately targeting optimal data file sizes (typically 128 MB to 1 GB for Parquet on S3) to balance S3 request …

8 3
Audit Trails Architecture

The practice of recording a tamper-evident history of all data access, modification, and governance events within an S3-based lake…

6 3
PII Tokenization Architecture

The process of replacing personally identifiable information (PII) in S3-stored datasets with non-reversible or reversible tokens,…

9 3
Batch vs Streaming Architecture

The architectural decision between processing S3 data in periodic batch jobs (hourly/daily) versus continuous streaming ingestion,…

8 3
Event-Driven Ingestion Architecture

An architecture pattern where data ingestion into S3-based lakehouses is triggered by events (S3 notifications, Kafka messages, we…

8 3
Manifest Pruning Architecture

The optimization technique used by table formats (especially Iceberg) to skip reading irrelevant manifest files during query plann…

5 3
Compliance-Aware Architectures Architecture

Lakehouse design patterns that embed regulatory requirements (GDPR, CCPA, HIPAA, SOX) directly into the data architecture rather t…

14 3
Branching / Tagging Architecture

The catalog-level capability to create lightweight named references (branches and tags) to specific table states, enabling isolate…

7 3
AI-Safe Views Architecture

The practice of creating constrained, pre-filtered views over lakehouse tables that limit what data AI/LLM systems can access, pre…

8 3
Structured Chunking Architecture

The practice of splitting S3-stored structured and semi-structured data (Parquet files, JSON documents, CSV records) into semantic…

5 3
Benchmarking Methodology Architecture

The discipline of designing, executing, and reporting reproducible performance tests for S3-based data systems, covering throughpu…

6 3
Capacity Planning Architecture

The practice of forecasting and provisioning storage, compute, and network resources for S3-based data systems based on projected …

6 3
Hybrid Metadata Patterns Architecture

Architectural approaches that combine multiple metadata systems (e.g., Glue Catalog for Iceberg tables, OpenMetadata for governanc…

6 3
Interoperability Patterns Architecture

Architectural strategies for enabling multiple table formats (Iceberg, Delta, Hudi), query engines (Spark, Trino, Flink), and cata…

6 3
Non-Blocking Concurrency Control Architecture

A concurrency model for lakehouse table formats that uses distributed timelines rather than locks or optimistic retries, allowing …

5 3
Decoupled Vector Search Architecture

A vector database architecture that separates index storage on object storage from query compute, using Inverted File Indexes (IVF…

5 3
Partitioning Architecture

The strategy of physically organizing table data files by column values so query engines can skip irrelevant files. On S3-backed l…

5 3
Credential Vending Architecture

A security architecture where a control plane issues short-lived, narrowly scoped S3 credentials at query time rather than relying…

6 3
Object Lifecycle Management Architecture

Automated rules that transition S3 objects between storage tiers (Standard → Infrequent Access → Glacier → Deep Archive) or expire…

5 2
Lakehouse for AI Workflows Architecture

The architectural pattern of using governed, ACID-transactional lakehouse tables on S3 as the single data substrate for AI/ML pipe…

6 2
Multimodal Object Storage Architecture

An architectural pattern for co-locating heterogeneous data types — images, video, audio, PDFs, sensor streams — alongside structu…

5 2
Redaction Layers Architecture

A query-time data protection architecture that dynamically masks, tokenizes, or filters sensitive fields from S3-backed lakehouse …

6 2
View in graph →