Technology

Weaviate

An open-source vector database with hybrid search combining BM25 keyword matching and vector similarity in a single query, plus multi-tenancy and S3-tiered cold storage.

3 connections 2 resources

Summary

What it is

An open-source vector database with hybrid search combining BM25 keyword matching and vector similarity in a single query, plus multi-tenancy and S3-tiered cold storage.

Where it fits

Weaviate is the stateful vector search server for teams that need both keyword and semantic retrieval over S3-derived embeddings. Its tiered storage offloads cold tenants to S3, aligning with the separation of storage and compute pattern. It represents the opposite architectural choice from LanceDB — a managed, always-on server vs. embedded serverless queries.

Misconceptions / Traps
  • Weaviate is a stateful server requiring dedicated infrastructure — it is not serverless like LanceDB. Plan for operational overhead including backups, scaling, and upgrades.
  • Hybrid search (BM25 + vector) is powerful but requires tuning the fusion algorithm. Default weights rarely match production relevance needs.
  • Multi-tenancy isolates data but shares cluster resources. Noisy-neighbor effects are possible without proper resource limits.
Key Connections
  • scoped_to Vector Indexing on Object Storage — stores cold vectors on S3
  • solves Cold Scan Latency — pre-indexed hybrid search over embeddings
  • alternative_to LanceDB — stateful server vs serverless on S3

Definition

What it is

An open-source vector database with hybrid search combining vector similarity and BM25 keyword scoring. Supports multi-tenancy and tiered storage that offloads inactive tenants to S3-compatible backends.

Why it exists

Pure vector search misses keyword-exact matches and pure keyword search misses semantic meaning. Weaviate combines both retrieval modes in a single query, reducing the need for separate search pipelines. Its multi-tenant architecture lets SaaS platforms isolate customer data while sharing infrastructure, and its S3 tiered storage keeps cold data off expensive local disks.

Primary use cases

Hybrid semantic + keyword search over S3-derived embeddings, multi-tenant RAG backends, tiered embedding storage with hot data in memory and cold data on S3.

Connections 3

Outbound 3

Resources 2