Turbopuffer
An object-storage-native vector and full-text search engine where S3/GCS is the durable source of truth and SSD/RAM are caches, built around S3 strong consistency and compare-and-swap.
Summary
An object-storage-native vector and full-text search engine where S3/GCS is the durable source of truth and SSD/RAM are caches, built around S3 strong consistency and compare-and-swap.
It is the reference design for retrieval infrastructure on object storage, competing with RAM-first vector DBs (Pinecone, Qdrant) and pgvector on cost and operational simplicity. Its disruption angle is storage cost: roughly 10x cheaper by keeping cold data on S3 (~$0.02/GB) instead of DRAM/replicated SSD.
- It is not RAM-resident: the first query to a cold namespace hits object storage and is slow (~874 ms p50 for 1M docs); cheap cost comes with cold-start latency.
- Write throughput per namespace is bounded by group-commit batching (historically ~1 WAL entry/sec/namespace), so it favors many namespaces over a single hot one.
depends_onS3 API — uses S3 strong consistency + CAS as its correctness foundation.alternative_toPinecone — same vector-search job, but object-storage-first economics.optimizes_forRetrieval Engineering — purpose-built for RAG/semantic-search retrieval at low cost.
Definition
Turbopuffer is a search engine built from first principles on object storage, providing vector search and full-text/BM25 search with object storage as the durable source of truth. SSD and RAM act only as caches in front of S3/GCS. It is designed for AI applications, semantic search, and recommendation systems at very low cost.
Traditional vector databases keep data in RAM plus replicated SSD, which is expensive and operationally heavy. Turbopuffer makes object storage the system of record and caches hot data on NVMe/RAM, exploiting S3's strong read-after-write consistency (2020) and compare-and-swap (Dec 2023) to get durability and correctness cheaply. This is the canonical example of building retrieval infrastructure natively on object storage.
Vector search for RAG, full-text/BM25 search, recommendation and semantic search, multi-tenant AI search with many namespaces, large cold-but-searchable corpora.
Recent developments
- Turbopuffer's architecture uses a write-ahead log on object storage as the durable source of truth, with NVMe SSD and RAM as caches. A successful write is guaranteed durably persisted to object storage; first query to a namespace reads S3 directly (p50 ~874 ms for 1M docs) and cached queries drop to p50 ~14 ms. Per Turbopuffer Architecture.
- It relies on S3's strong consistency and compare-and-swap to coordinate commits, batching concurrent writes per namespace into group commits. Per Turbopuffer Architecture.
- The vector index uses SPFresh, a centroid-based approximate-nearest-neighbor index chosen to minimize roundtrips and write amplification on object storage versus graph-based indexes. Per turbopuffer: fast search on object storage.
- Production users include Cursor, Notion, Linear, Superhuman, Anthropic, Atlassian, Grammarly, and Harvey; the system reports 4T+ documents, 10M+ writes/sec, and 25k+ queries/sec. Per turbopuffer — fast search engine built on object storage.
Connections 8
Outbound 8
scoped_to1depends_on1reads_from1competes_with1solves1optimizes_for1Resources 3
Authoritative description of the object-storage WAL source-of-truth, SSD/RAM caching tiers, consistency, and cold/warm latency.
Founding design rationale, including the SPFresh ANN index choice for object storage and the cost argument.
Named production customers and scale metrics (4T+ docs, 25k+ QPS) plus the cost-disruption positioning.