Chroma
Open-source AI-native search infrastructure with a client-server architecture and pluggable storage backends. In embedded mode runs as SQLite + HNSW (via hnswlib); in server mode runs as a standalone gRPC/REST service. Cloud and self-hosted deployments use a tiered storage architecture — hot data in memory cache, warm data on SSD, cold data in S3/GCS object storage — with automatic query-aware data tiering managed by the runtime. Famous in the LangChain/LlamaIndex ecosystem as the lowest-friction vector database to spin up locally.
Definition
Open-source AI-native search infrastructure with a client-server architecture and pluggable storage backends. In embedded mode runs as SQLite + HNSW (via hnswlib); in server mode runs as a standalone gRPC/REST service. Cloud and self-hosted deployments use a tiered storage architecture — hot data in memory cache, warm data on SSD, cold data in S3/GCS object storage — with automatic query-aware data tiering managed by the runtime. Famous in the LangChain/LlamaIndex ecosystem as the lowest-friction vector database to spin up locally.
Most vector databases optimize for production at scale and treat developer ergonomics as secondary. Chroma inverted that: it's easy to embed in a Python notebook, has a tiny API surface, and works out-of-the-box for prototypes and small-to-medium production deployments. The 2026 framing has matured — Chroma is the right tool for single-node deployments up to ~5-10M vectors, but past that threshold the **Amnesia Loop** failure mode kicks in (retrieval timeouts → agent fallback to generic model knowledge → user-visible quality regression), and operators graduate to Qdrant, Milvus, or Pinecone.
Local prototyping and developer experimentation, RAG pipelines under 5-10M vectors, embedded vector search inside Python applications and notebooks, LangChain/LlamaIndex tutorial backends, and edge AI deployments where the entire vector store must fit on a single node.
Recent developments
- Tiered-storage architecture with S3/GCS cold tier. Chroma's Query Layer now uses memory cache (hot) + SSD cache (warm), backed by a Storage Layer that uses S3 / GCS (cold) for all vectors, metadata, and indexes — with automatic query-aware data tiering. Per trychroma.com products page.
- Production scaling ceiling named explicitly. Independent comparisons now position Chroma as not production-grade at scale — performance degrades sharply past 10M vectors with the Amnesia Loop pattern as the failure signature. Operators are advised to graduate to Qdrant, Milvus, or Pinecone above that threshold. Per Encore guide and RankSquire 2026.
- 2026 migration playbook published. RankSquire published a 5-alternative migration ranking for teams outgrowing Chroma, with named landing spots for different access patterns. Per Chroma Database Alternative 2026.
- DataRobot agentic-AI integration. DataRobot now ships Chroma as the default self-hosted vector DB for its agentic-AI runtime, with explicit S3-backed deployment docs. Per docs.datarobot.com.