Decoupled Vector Search
A vector database architecture that separates index storage on object storage from query compute, using Inverted File Indexes (IVF) and Product Quantization (PQ) to compress the in-memory vector footprint by approximately 64x while fetching full-precision vectors from S3 only for final re-ranking.
Summary
A vector database architecture that separates index storage on object storage from query compute, using Inverted File Indexes (IVF) and Product Quantization (PQ) to compress the in-memory vector footprint by approximately 64x while fetching full-precision vectors from S3 only for final re-ranking.
Replaces monolithic, RAM-bound HNSW-based vector databases for billion-scale retrieval. The foundational architecture behind Amazon S3 Vectors and Databricks Vector Search. Enables enterprise RAG pipelines without provisioned vector database clusters by tiering the index to object storage and keeping only compressed cluster centroids in memory.
- Not inherently slower than in-memory search for all workloads — warm query latency reaches approximately 100ms, sufficient for non-real-time RAG.
- Does not eliminate the need for embeddings — it changes where and how they are stored and queried.
- Requires a dual-runtime query engine separating asynchronous I/O threads from CPU-bound distance computation to prevent network latency from starving compute cores.
enablesAmazon S3 Vectors — S3 Vectors implements this architecture natively at the storage layerenablesRAG over Structured Data — makes billion-scale semantic retrieval economically viable on S3scoped_toVector Indexing on Object Storage — the architecture that defines how vectors live on object storage
Definition
A vector database architecture that separates index storage on object storage from query compute, using Inverted File Indexes (IVF) and Product Quantization (PQ) to compress the in-memory vector footprint by approximately 64x while fetching full-precision vectors from S3 only for final re-ranking.
Storing full-precision HNSW graphs entirely in RAM becomes economically unviable beyond approximately 100 million vectors. Decoupled vector search maps IVF clusters to contiguous S3 object fragments, keeps only compressed centroids in memory, and issues concurrent byte-range reads to fetch full-precision vectors for final distance computation. This requires a dual-runtime query engine where asynchronous I/O threads are strictly separated from CPU-bound mathematical threads.
Billion-scale enterprise RAG pipelines, serverless vector retrieval via S3 Vectors, cost-efficient semantic search without provisioned vector database clusters.
Connections 5
Outbound 5
scoped_to2solves1Resources 3
Official AWS documentation for S3 Vectors, the primary implementation of decoupled vector search on object storage, including index limits and pricing.
AWS engineering blog detailing the architecture of native vector support in S3, covering IVF indexing and approximate nearest neighbor mechanics.
Practical guide to building serverless RAG pipelines using decoupled vector search on S3 with Amazon Bedrock integration.