Definition

What it is

An open-source **model gateway** that abstracts the complexity of calling hundreds of different LLM endpoints behind a unified, OpenAI-compatible API. Provides load balancing, automatic failover, cost optimization, rate limiting, and — critically for S3-relevance — **semantic-cache backends targeting S3** (`type: s3` in the LiteLLM config schema). When a query hits the gateway with a high-confidence semantic match against a cached prompt, LiteLLM returns the cached response instantly, bypassing the upstream LLM provider and the associated per-token cost.

Why it exists

Production LLM deployments rarely commit to a single model vendor — costs shift, capability windows move, regional availability varies, and failover requirements force multi-provider strategies. LiteLLM provides a single API surface that hides the multi-vendor complexity, plus an S3-backed semantic cache that converts repetitive agentic queries into near-zero-cost lookups. The S3 cache layer is the operationally interesting piece: it survives across gateway restarts and is shared across all gateway instances behind a load balancer.

Primary use cases

Multi-LLM-provider serving with unified API, S3-backed semantic prompt caching for cost optimization, automatic failover across model vendors, audit-log streaming to S3 for observability and compliance, gateway-as-rate-limiter for cost governance.

Recent developments

Latest signals

Latest release: v1.90.0 (June 2026). LiteLLM dropped the -stable version suffix at v1.84.0 and now ships plain semver; the legacy main-stable Docker tag is deprecated (retires June 30, 2026). Per BerriAI/litellm releases.
140+ providers, 2,500+ models behind a single OpenAI-compatible interface (March 2026). Covers OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, Mistral, Ollama, vLLM, plus emerging providers like Nebius AI. The reference "one gateway to call them all" pattern for production LLM stacks. Per a2a-mcp — What Is LiteLLM? 140+ Providers 2026.
~40K GitHub stars, 1,300+ contributors. Among the most-actively maintained AI infrastructure projects in 2026; rapid model additions as new providers launch. Per GitHub — BerriAI/litellm.
Agent Hub, MCP support, sidecar architecture added. 2026 LiteLLM releases extended the proxy with an Agent Hub (cross-team agent registry), Model Context Protocol integration, and a sidecar architecture that reduces proxy overhead for high-QPS deployments. Per Nerd Level Tech — LiteLLM Proxy Production Tutorial 2026.
Production stack: Docker Compose + Postgres + virtual keys + per-team budgets + auto-failover. Standard 2026 deployment shape: LiteLLM proxy with Postgres for analytics/spend tracking, virtual API keys per team, and automatic fallback routing across Claude Sonnet 4.6 / GPT-5.4 / Gemini 2.5 Pro. Per Nerd Level Tech — LiteLLM Proxy Production Tutorial 2026.
Multi-tenant features: per-project logging/guardrails/caching, virtual keys, admin UI. Production-grade governance layer means LiteLLM is increasingly chosen at the platform-team level, not just by individual app developers. Per LiteLLM — AI Gateway docs.
Maxim 2026 LLM-router benchmark places LiteLLM in the top tier. Alongside Portkey, Helicone, and Vercel AI Gateway — LiteLLM wins on provider count and OSS extensibility; commercial gateways win on managed-service polish. Per Maxim — Top 5 LLM Router Solutions 2026.