The S3 Vector Database Landscape: Choosing Between Embedded, Standalone, and Distributed

The Three-Tier Split

The vector database market in 2026 is not one market. It is three, separated by architectural philosophy.

Tier 1: Embedded / Serverless on S3. LanceDB stores its index directly on S3-compatible object storage using the Lance Format. There is no server. The application imports a library, points at an S3 bucket, and queries. The index is the storage — vectors, metadata, and raw payloads live as immutable fragments on the same object store that holds the rest of your data lake.1

Tier 2: Stateful Standalone. Weaviate and Qdrant run as persistent server processes. They maintain HNSW indexes in local memory or SSD, serve queries with sub-10ms latency, and optionally tier cold data to S3. The index and the storage are separate systems — S3 holds the source documents, the vector database holds the mathematical representation.2

Tier 3: Distributed. Milvus distributes the vector index across a cluster of nodes with SSD caching for hot segments and S3 for cold persistent storage. It is the only open-source option that credibly handles billions of vectors — at the cost of operating etcd, a message queue, and S3 as a coordinated system.3

These are not points on a spectrum. They are fundamentally different architectures with different failure modes, different operational costs, and different relationships to S3.

S3 as Source of Truth vs. S3 as Backup

The most important distinction is not performance benchmarks. It is where the authoritative state lives.

In the LanceDB model, S3 is the source of truth. Vectors, metadata, and text payloads are stored as a unified table on the object store. When a document is deleted from S3, the corresponding vector is deleted from the same location. There is no synchronization problem because there is only one system.1

In the Weaviate/Qdrant/Milvus model, S3 holds the raw documents and the vector database holds a derived representation. This creates the embedding drift problem: a document changes on S3, but the corresponding vector in the database goes stale. The LLM retrieves outdated context and confidently hallucinates incorrect answers. Keeping these systems synchronized requires an explicit pipeline — S3 event notifications → embedding worker → vector database write — and every step is a potential failure point.4

This is not an argument that LanceDB is always better. It is an observation that the serverless-on-S3 model eliminates an entire category of failure by construction, while the stateful model trades that simplicity for latency and feature advantages.

The Latency Floor

S3-compatible object storage communicates via HTTP. Every query against LanceDB on S3 pays at minimum one HTTP round-trip to fetch index fragments — typically 50-200ms depending on the storage backend and geographic proximity. With NVMe caching enabled, warm queries drop to ~25ms as hot index fragments are served from local disk. But the first query for any given data always hits the S3 latency floor.5

Stateful vector databases avoid this entirely. Weaviate and Qdrant hold their HNSW indexes in memory or on local SSD. P50 query latency sits at 1-5ms. For applications with strict latency requirements — voice agents needing <200ms end-to-end, real-time recommendation engines, interactive search — this gap is non-negotiable.2

The question is not "which is faster" (stateful servers win trivially) but "does the latency difference matter for your use case." Batch RAG pipelines, document search applications, async AI agents, and analytical workloads operate comfortably within LanceDB's latency range. Real-time conversational interfaces and trading systems cannot.

Hybrid Search: Native vs. Assembled

Weaviate provides hybrid search — fusing BM25 keyword matching and dense vector similarity in a single query — as a first-class built-in capability. The fusion algorithm (reciprocal rank fusion by default) is configurable, and the BM25 index and HNSW graph are co-located in the same process.6

Qdrant approaches the problem differently. Its core strength is payload-level filtering: metadata constraints (date ranges, document types, tenant IDs) are applied during HNSW traversal, not after. This avoids the recall degradation that post-filtering causes on approximate nearest neighbor graphs. But full-text BM25 is not built in — you need an external search engine for keyword matching.7

LanceDB added full-text search alongside vector queries, enabling hybrid retrieval directly on S3 without a separate search service. The fusion happens at the application level — less turnkey than Weaviate's built-in hybrid mode, but sufficient for many RAG pipelines.1

Milvus introduced hybrid search capabilities but the implementation is younger and less battle-tested than Weaviate's. At billion-scale, Milvus's strength is raw vector throughput, not retrieval sophistication.3

The Operational Cost Spectrum

The operational burden of these tiers varies by an order of magnitude:

LanceDB: Zero infrastructure. pip install lancedb, configure S3 credentials, query. No server process, no backups to manage (the index is on S3), no capacity planning. The "ops cost" is a cron job running table.optimize() to compact index fragments during off-peak hours.8

Qdrant: Single binary, runs as a Docker container or standalone process. Moderate operational load — monitor memory usage, configure snapshot-based backups (can target S3), plan for index growth. A single instance handles tens of millions of vectors. Horizontal scaling via sharding adds operational complexity but is straightforward.7

Weaviate: Go-based server, typically deployed via Docker or Kubernetes. Moderate-to-high operational load — multi-tenant configurations require resource isolation tuning, S3-tiered storage requires lifecycle configuration, and the HNSW index memory consumption needs active monitoring. More feature-rich than Qdrant, more operational surface area.6

Milvus: Distributed microservices architecture. Requires etcd for metadata coordination, Pulsar or Kafka for write-ahead logging, and MinIO or S3 for persistent storage. High operational load — this is a distributed system with multiple failure domains. Justified only at scales (>1B vectors) where single-node databases cannot physically fit the index.3

When Each Tier Wins

Choose LanceDB when:

  • S3 is already your data lake foundation and you want vector search without infrastructure sprawl
  • Your team is small and cannot dedicate engineering time to database operations
  • Sub-second latency (100-500ms) is acceptable for your retrieval workload
  • You want zero data duplication — vectors, metadata, and payloads in one place on S3
  • Your vector count is under ~1B and a single query node is sufficient

Choose Weaviate or Qdrant when:

  • Your application requires consistent sub-10ms retrieval latency
  • You need production-ready hybrid search (BM25 + vector) with minimal integration work
  • Multi-tenant data isolation is a hard requirement (SaaS, legal tech, financial platforms)
  • Your operational team can manage a stateful database with monitoring and backups
  • Qdrant specifically when metadata filtering is the dominant query pattern; Weaviate when hybrid keyword+vector search is the priority

Choose Milvus when:

  • Your vector count exceeds 1 billion and is growing
  • You need distributed query execution across multiple nodes
  • Your organization has platform engineering capacity to operate a multi-component distributed system
  • S3 cold storage offload is essential for cost management at extreme scale

The Emerging Middle Ground

The boundary between tiers is blurring. Weaviate's S3-tiered storage moves it closer to the "S3-native" model by offloading cold tenant data to object storage. LanceDB's NVMe caching narrows the latency gap with stateful databases. Milvus's S3 cold tier reduces the memory footprint that historically made it expensive at scale.

But the fundamental architectural divide remains: do you want your vector index to live on S3, or do you want a dedicated server sitting between your application and your storage? That choice determines your failure modes, your operational burden, and your cost structure more than any benchmark number.


Works cited

Footnotes

  1. Storage Architecture in LanceDB — LanceDB S3-native index architecture 2 3

  2. Best Vector Databases in 2026: A Complete Comparison Guide — Cross-database latency and feature comparison 2

  3. Milvus Documentation — Distributed architecture with S3 cold storage 2 3

  4. Designing RAG Systems on AWS with S3 Vectors — Embedding drift and synchronization challenges

  5. Vector search on object storage: Performance at scale without the infrastructure tax — LanceDB S3 latency benchmarks

  6. Weaviate Documentation — Hybrid search and multi-tenancy 2

  7. Qdrant Documentation — Payload filtering and HNSW configuration 2

  8. Keeping Indexes Up-to-Date with Reindexing — LanceDB incremental index optimization