Weaviate
An open-source vector database with hybrid search combining BM25 keyword matching and vector similarity in a single query, plus multi-tenancy and S3-tiered cold storage.
Summary
An open-source vector database with hybrid search combining BM25 keyword matching and vector similarity in a single query, plus multi-tenancy and S3-tiered cold storage.
Weaviate is the stateful vector search server for teams that need both keyword and semantic retrieval over S3-derived embeddings. Its tiered storage offloads cold tenants to S3, aligning with the separation of storage and compute pattern. It represents the opposite architectural choice from LanceDB — a managed, always-on server vs. embedded serverless queries.
- Weaviate is a stateful server requiring dedicated infrastructure — it is not serverless like LanceDB. Plan for operational overhead including backups, scaling, and upgrades.
- Hybrid search (BM25 + vector) is powerful but requires tuning the fusion algorithm. Default weights rarely match production relevance needs.
- Multi-tenancy isolates data but shares cluster resources. Noisy-neighbor effects are possible without proper resource limits.
scoped_toVector Indexing on Object Storage — stores cold vectors on S3solvesCold Scan Latency — pre-indexed hybrid search over embeddingsalternative_toLanceDB — stateful server vs serverless on S3
Definition
An open-source vector database with hybrid search combining vector similarity and BM25 keyword scoring. Supports multi-tenancy and tiered storage that offloads inactive tenants to S3-compatible backends.
Pure vector search misses keyword-exact matches and pure keyword search misses semantic meaning. Weaviate combines both retrieval modes in a single query, reducing the need for separate search pipelines. Its multi-tenant architecture lets SaaS platforms isolate customer data while sharing infrastructure, and its S3 tiered storage keeps cold data off expensive local disks.
Hybrid semantic + keyword search over S3-derived embeddings, multi-tenant RAG backends, tiered embedding storage with hot data in memory and cold data on S3.
Recent developments
Source mix note: Weaviate's recent third-party benchmark coverage is heavy; the bullets below cite multiple comparison surveys to triangulate performance positioning rather than rely on any single vendor-published number.
- Hybrid search positioning: HNSW + BM25F + compression + hybrid ranking as first-class. Per TechBytes' 2026 Pinecone/Weaviate/pgvector comparison, Weaviate is strongest when search itself is the product — HNSW, BM25F lexical scoring, vector compression, and hybrid ranking are all first-class rather than bolt-ons. The 2026 convergence trends across the vector DB landscape are: compression by default, read/write disaggregation, and hybrid (vector + keyword) ranking; Weaviate has been ahead of that curve for two years.
- Concrete benchmark positioning at 1M-vector scale. Per the HolySheep showdown vs Milvus + Qdrant (April 2026), Weaviate 1.23 posts cold query p50 35ms / p99 189ms, warm query p50 11ms / p99 47ms, bulk insert of 1M vectors in 6m 03s, and a 98.41% success rate under 1000 QPS sustained load. Treat these as third-party positioning data, not official Weaviate measurements.
Connections 6
Outbound 3
Inbound 3
competes_with2alternative_to1Resources 2
Official documentation covering hybrid vector-keyword search, schema design, multi-tenancy, and S3-tiered storage configuration.
Source repository for the Go-based vector search engine with release notes, issue tracking, and architecture documentation.