Architecture

Decoupled Vector Search

A vector database architecture that separates index storage on object storage from query compute, using Inverted File Indexes (IVF) and Product Quantization (PQ) to compress the in-memory vector footprint by approximately 64x while fetching full-precision vectors from S3 only for final re-ranking.

5 connections 3 resources 1 post

Summary

What it is

A vector database architecture that separates index storage on object storage from query compute, using Inverted File Indexes (IVF) and Product Quantization (PQ) to compress the in-memory vector footprint by approximately 64x while fetching full-precision vectors from S3 only for final re-ranking.

Where it fits

Replaces monolithic, RAM-bound HNSW-based vector databases for billion-scale retrieval. The foundational architecture behind Amazon S3 Vectors and Databricks Vector Search. Enables enterprise RAG pipelines without provisioned vector database clusters by tiering the index to object storage and keeping only compressed cluster centroids in memory.

Misconceptions / Traps
  • Not inherently slower than in-memory search for all workloads — warm query latency reaches approximately 100ms, sufficient for non-real-time RAG.
  • Does not eliminate the need for embeddings — it changes where and how they are stored and queried.
  • Requires a dual-runtime query engine separating asynchronous I/O threads from CPU-bound distance computation to prevent network latency from starving compute cores.
Key Connections
  • enables Amazon S3 Vectors — S3 Vectors implements this architecture natively at the storage layer
  • enables RAG over Structured Data — makes billion-scale semantic retrieval economically viable on S3
  • scoped_to Vector Indexing on Object Storage — the architecture that defines how vectors live on object storage

Definition

What it is

A vector database architecture that separates index storage on object storage from query compute, using Inverted File Indexes (IVF) and Product Quantization (PQ) to compress the in-memory vector footprint by approximately 64x while fetching full-precision vectors from S3 only for final re-ranking.

Why it exists

Storing full-precision HNSW graphs entirely in RAM becomes economically unviable beyond approximately 100 million vectors. Decoupled vector search maps IVF clusters to contiguous S3 object fragments, keeps only compressed centroids in memory, and issues concurrent byte-range reads to fetch full-precision vectors for final distance computation. This requires a dual-runtime query engine where asynchronous I/O threads are strictly separated from CPU-bound mathematical threads.

Primary use cases

Billion-scale enterprise RAG pipelines, serverless vector retrieval via S3 Vectors, cost-efficient semantic search without provisioned vector database clusters.

Recent developments

Latest signals
  • LanceDB on S3 + AWS Lambda is the 2026 reference serverless implementation. Decoupled storage (Lance columnar on S3) + stateless query compute (Lambda) — automatic scaling per workload, no always-on infrastructure to manage. The "vector database without a vector-database cluster" pattern. Per Grokipedia — Vector Search in LanceDB and AWS Architecture Blog — Scalable Elastic Database for 1B+ Vectors on LanceDB + S3.
  • IVF-PQ at 200M vectors per bucket: partitions = √rows; subvectors = dims/16. AWS's published reference implementation: per 200M-vector bucket, create LanceDB tables with IVF-PQ on cosine distance — partition count is the square root of row count, subvector count is vector dimensionality divided by 16. The empirical tuning formula that makes billion-scale serverless vector search work. Per AWS Architecture Blog — Scalable 1B+ Vector Solution on LanceDB + S3.
  • LanceDB 2026 launches: Lance-native SQL via DuckDB, multi-bucket storage, 1.5M IOPS benchmarks. Q1 2026 LanceDB shipped DuckDB SQL retrieval against Lance natively, Uber-scale multi-bucket storage, and 1.5M IOPS benchmarks against object storage. The serverless-vector-search-on-S3 architecture is no longer constrained by IOPS ceilings. Per LanceDB — AI-Native Multimodal Lakehouse.
  • Mindshare growth: 6.7% → 9.6% YoY (steepest among 9 leading vector databases). MarkTechPost's 2026 nine-database comparison ranks LanceDB as the fastest-growing entrant — driven by interest in serverless + multimodal architectures. The serverless-vector-search pattern is winning architectural mindshare. Per MarkTechPost — Best Vector Databases in 2026: Pricing, Scale Limits, Architecture Tradeoffs.
  • IVF-PQ compresses in-memory footprint ~64×. Inverted File Indexes (IVF) + Product Quantization (PQ) keep only compressed centroids in RAM; full-precision vectors stay on S3 + load via concurrent byte-range reads for final re-ranking. The technical mechanism that makes billion-scale serverless work. Per LanceDB Blog — IVF_PQ: Accelerate Vector Search by Creating Indices.
  • Dual-runtime engine requirement: async I/O threads separated from CPU-bound math. Production decoupled vector search requires strict separation of I/O-bound threads (fetching from S3) from CPU-bound threads (distance computation) — single-threaded designs collapse under the I/O latency. The architectural constraint that determines which vector DBs scale. Per Medium — The Future of Vector Search: Exploring LanceDB for Billion-Scale Search.

Connections 5

Outbound 5

Resources 3

Featured in