Technology

Turbopuffer

An object-storage-native vector and full-text search engine where S3/GCS is the durable source of truth and SSD/RAM are caches, built around S3 strong consistency and compare-and-swap.

8 connections 3 resources

Summary

What it is

An object-storage-native vector and full-text search engine where S3/GCS is the durable source of truth and SSD/RAM are caches, built around S3 strong consistency and compare-and-swap.

Where it fits

It is the reference design for retrieval infrastructure on object storage, competing with RAM-first vector DBs (Pinecone, Qdrant) and pgvector on cost and operational simplicity. Its disruption angle is storage cost: roughly 10x cheaper by keeping cold data on S3 (~$0.02/GB) instead of DRAM/replicated SSD.

Misconceptions / Traps

It is not RAM-resident: the first query to a cold namespace hits object storage and is slow (~874 ms p50 for 1M docs); cheap cost comes with cold-start latency.
Write throughput per namespace is bounded by group-commit batching (historically ~1 WAL entry/sec/namespace), so it favors many namespaces over a single hot one.

Key Connections

depends_on S3 API — uses S3 strong consistency + CAS as its correctness foundation.
alternative_to Pinecone — same vector-search job, but object-storage-first economics.
optimizes_for Retrieval Engineering — purpose-built for RAG/semantic-search retrieval at low cost.

Definition

What it is

Turbopuffer is a search engine built from first principles on object storage, providing vector search and full-text/BM25 search with object storage as the durable source of truth. SSD and RAM act only as caches in front of S3/GCS. It is designed for AI applications, semantic search, and recommendation systems at very low cost.

Why it exists

Traditional vector databases keep data in RAM plus replicated SSD, which is expensive and operationally heavy. Turbopuffer makes object storage the system of record and caches hot data on NVMe/RAM, exploiting S3's strong read-after-write consistency (2020) and compare-and-swap (Dec 2023) to get durability and correctness cheaply. This is the canonical example of building retrieval infrastructure natively on object storage.

Primary use cases

Vector search for RAG, full-text/BM25 search, recommendation and semantic search, multi-tenant AI search with many namespaces, large cold-but-searchable corpora.

Recent developments

Latest signals

Turbopuffer's architecture uses a write-ahead log on object storage as the durable source of truth, with NVMe SSD and RAM as caches. A successful write is guaranteed durably persisted to object storage; first query to a namespace reads S3 directly (p50 ~874 ms for 1M docs) and cached queries drop to p50 ~14 ms. Per Turbopuffer Architecture.
It relies on S3's strong consistency and compare-and-swap to coordinate commits, batching concurrent writes per namespace into group commits. Per Turbopuffer Architecture.
The vector index uses SPFresh, a centroid-based approximate-nearest-neighbor index chosen to minimize roundtrips and write amplification on object storage versus graph-based indexes. Per turbopuffer: fast search on object storage.
Production users include Cursor, Notion, Linear, Superhuman, Anthropic, Atlassian, Grammarly, and Harvey; the system reports 4T+ documents, 10M+ writes/sec, and 25k+ queries/sec. Per turbopuffer — fast search engine built on object storage.