Separation of Storage and Compute
Summary
What it is
The design pattern of keeping data in S3 while running independent, elastically scaled compute engines against it.
Where it fits
This is the foundational architectural principle of the S3 ecosystem. Every query engine, table format, and data pipeline in this index assumes storage and compute are separate — data stays in S3, compute spins up and down on demand.
Misconceptions / Traps
- Separation of storage and compute does not mean "no local storage." Caching, spill-to-disk, and local indexes are still used — the principle is that the source of truth is in S3.
- Network latency between compute and S3 is the fundamental trade-off. Every query pays the cost of reading over HTTP instead of local disk.
Key Connections
depends_onS3 API — the interface that enables decouplingsolvesVendor Lock-In — swap compute engines without moving dataconstrained_byCold Scan Latency, Egress Cost — the costs of network-based data access- ClickHouse
implementsSeparation of Storage and Compute scoped_toS3, Object Storage
Definition
What it is
The design pattern of keeping data exclusively in object storage (S3) while running independent, elastically scaled compute engines against it. Compute and storage scale independently.
Why it exists
Coupled storage-and-compute systems (traditional databases, HDFS with co-located compute) force you to scale both together. Decoupling allows you to pay for storage at S3 prices and spin compute up or down on demand.
Primary use cases
Elastic analytics (scale query engines independently of data volume), multi-engine access (multiple query engines read the same S3 data), cost optimization.
Relationships
Outbound Relationships
scoped_todepends_onsolvesconstrained_byResources
Snowflake's official architecture documentation describing the three-layer design (storage, compute, cloud services) that pioneered commercial separation of storage and compute.
Databricks' architecture documentation showing how Spark clusters run separately from data on S3/ADLS/GCS object storage.
Databricks glossary entry explaining how the lakehouse pattern depends on decoupled storage and compute.