Guide 33

Choosing a Vector Database for S3 Workloads

Problem Framing

The vector database market has fragmented into three distinct architectural tiers, each with a fundamentally different relationship to S3-compatible object storage. Serverless embedded engines (LanceDB) store indexes directly on S3 and require no infrastructure. Stateful standalone servers (Weaviate, Qdrant) maintain indexes in local memory or disk with optional S3 tiering. Distributed clusters (Milvus) shard indexes across nodes with S3 as persistent cold storage.

Choosing wrong has real consequences. Running Milvus for 10 million vectors wastes operational budget on cluster management you don't need. Using LanceDB when your application requires sub-10ms filtered retrieval hits a physics wall — S3 HTTP round-trips have a latency floor that no amount of caching fully eliminates. The decision hinges on your latency requirements, vector scale, S3 integration model, and team's operational capacity.

Relevant Nodes

Topics: Vector Indexing on Object Storage, S3
Technologies: LanceDB, Weaviate, Qdrant, Milvus
Standards: Lance Format, S3 API
Architectures: Hybrid S3 + Vector Index, Decoupled Vector Search, Separation of Storage and Compute
Pain Points: Cold Scan Latency
Model Classes: Embedding Model
LLM Capabilities: Semantic Search, Embedding Generation

Decision Path

Determine your latency requirement. If your application needs consistent sub-10ms retrieval (voice agents, real-time recommendation), you need a stateful server with indexes in memory — Weaviate or Qdrant. If sub-second latency (100-500ms) is acceptable (batch RAG, document search, async agents), LanceDB querying S3 directly is viable and eliminates all server infrastructure.
Estimate your vector scale. Under 100 million vectors fits comfortably on a single node — Qdrant or Weaviate. Between 100M and 1B, consider Weaviate with S3-tiered cold storage or LanceDB with NVMe caching. Above 1 billion vectors, Milvus is the only open-source option that distributes the index across a cluster with S3 cold offload.
Decide if you need native hybrid search. Weaviate provides BM25 + vector fusion in a single query out of the box. Qdrant supports payload filtering during HNSW traversal but not full-text BM25. LanceDB supports full-text search alongside vector queries. Milvus added hybrid search but it is less mature than Weaviate's implementation.
Clarify where your data lives. If S3 is the source of truth and you want zero data duplication, LanceDB is the natural fit — the index is the S3 data. Every other option requires a sync pipeline between S3 and the vector database, introducing embedding drift risk when documents change on S3 but vectors go stale.
Assess your operational budget. LanceDB requires zero infrastructure — import the library, point at S3, query. Qdrant and Weaviate require a server process with monitoring, backups, and capacity planning. Milvus requires etcd, a message queue (Pulsar or Kafka), and S3 — a multi-component distributed system demanding dedicated platform engineering.

What Changed Over Time

LanceDB matured from experimental to production-grade (2024-2025), with NVMe caching reducing S3 query latency from >200ms to ~25ms for warm data.
Weaviate added S3-tiered storage for multi-tenant cold data offload, bridging the gap between stateful and S3-native architectures.
Milvus adopted S3 as a first-class cold storage tier, storing segments and logs durably on object storage while keeping hot data on SSD.
Qdrant's Rust-based engine emerged as the performance-per-watt leader for single-node deployments, particularly attractive for self-hosted labs.
The distinction between "vector database" and "vector index on S3" became the primary architectural decision, replacing the earlier "which vector DB has the best benchmarks" framing.

Problem Framing

Relevant Nodes

Decision Path

What Changed Over Time

Sources