Technology

Mooncake

The open-source LLM serving platform for **Kimi**, Moonshot AI's leading LLM product. Repository: [github.com/kvcache-ai/Mooncake](https://github.com/kvcache-ai/Mooncake). Mooncake's architectural distinguishing feature is **disaggregated prefill** — separating the prefill compute pool from the decode compute pool, with KV-cache state transferred between them via a dedicated storage layer (DRAM, NVMe, or S3-compatible object storage). This pattern is the structural answer to the "prefill is expensive, decode is memory-bound, they have different optimal hardware" tension.

6 connections 1 post

Definition

What it is

The open-source LLM serving platform for **Kimi**, Moonshot AI's leading LLM product. Repository: [github.com/kvcache-ai/Mooncake](https://github.com/kvcache-ai/Mooncake). Mooncake's architectural distinguishing feature is **disaggregated prefill** — separating the prefill compute pool from the decode compute pool, with KV-cache state transferred between them via a dedicated storage layer (DRAM, NVMe, or S3-compatible object storage). This pattern is the structural answer to the "prefill is expensive, decode is memory-bound, they have different optimal hardware" tension.

Why it exists

Production LLM serving at scale (Moonshot AI claims hundreds of thousands of concurrent users on the Kimi service) requires architecture that doesn't waste prefill compute on decode-heavy workloads and vice versa. Mooncake formalized the disaggregated-prefill pattern in open-source form, making the architecture reproducible outside Moonshot AI's internal infrastructure.

Primary use cases

High-scale LLM serving with disaggregated prefill/decode, KV-cache transfer over object-storage substrates, Kimi-style long-context model serving, multi-tenant LLM platforms targeting cost-per-token optimization.

Connections 6

Outbound 6
stores1
optimizes_for1
competes_with2

Featured in