Weaviate
An open-source vector database with hybrid search combining BM25 keyword matching and vector similarity in a single query, plus multi-tenancy and S3-tiered cold storage.
Summary
An open-source vector database with hybrid search combining BM25 keyword matching and vector similarity in a single query, plus multi-tenancy and S3-tiered cold storage.
Weaviate is the stateful vector search server for teams that need both keyword and semantic retrieval over S3-derived embeddings. Its tiered storage offloads cold tenants to S3, aligning with the separation of storage and compute pattern. It represents the opposite architectural choice from LanceDB — a managed, always-on server vs. embedded serverless queries.
- Weaviate is a stateful server requiring dedicated infrastructure — it is not serverless like LanceDB. Plan for operational overhead including backups, scaling, and upgrades.
- Hybrid search (BM25 + vector) is powerful but requires tuning the fusion algorithm. Default weights rarely match production relevance needs.
- Multi-tenancy isolates data but shares cluster resources. Noisy-neighbor effects are possible without proper resource limits.
scoped_toVector Indexing on Object Storage — stores cold vectors on S3solvesCold Scan Latency — pre-indexed hybrid search over embeddingsalternative_toLanceDB — stateful server vs serverless on S3
Definition
An open-source vector database with hybrid search combining vector similarity and BM25 keyword scoring. Supports multi-tenancy and tiered storage that offloads inactive tenants to S3-compatible backends.
Pure vector search misses keyword-exact matches and pure keyword search misses semantic meaning. Weaviate combines both retrieval modes in a single query, reducing the need for separate search pipelines. Its multi-tenant architecture lets SaaS platforms isolate customer data while sharing infrastructure, and its S3 tiered storage keeps cold data off expensive local disks.
Hybrid semantic + keyword search over S3-derived embeddings, multi-tenant RAG backends, tiered embedding storage with hot data in memory and cold data on S3.
Connections 3
Outbound 3
Resources 2
Official documentation covering hybrid vector-keyword search, schema design, multi-tenancy, and S3-tiered storage configuration.
Source repository for the Go-based vector search engine with release notes, issue tracking, and architecture documentation.