Model Class

Kimi K2

Frontier open-weight Mixture-of-Experts large language model from Moonshot AI. Architecture: **1T total parameters, 32B activated per token**, 384 experts (8 selected + 1 shared per layer), 61 layers, and Multi-head Latent Attention (MLA) for KV-cache compression. Released initially in 2025; the current K2.6 release (April 20, 2026) adds a 400M-parameter MoonViT vision encoder for native multimodal input, 256K context across all variants, and an Agent Swarm system that scales to 300 sub-agents and 4,000 coordinated steps per query. Modified MIT license — weights are downloadable.

8 connections

Definition

What it is

Why it exists

The 2025-2026 frontier of open-weight models is dominated by aggressively-sparse MoE architectures — getting frontier-class capability at fraction-of-dense inference cost. Kimi K2 is Moonshot AI's bet that a tightly-routed 32-of-1000B activation pattern beats dense alternatives on both training efficiency and serving cost. The 2026 evolution (K2.6) extends that bet to long-context agentic workloads where the **disaggregated prefill** architecture published in Moonshot's open-source Mooncake serving platform makes million-token-context agent runs economically viable.

Primary use cases

Agentic coding assistants (Kimi Code Review claims 88% coding-cost reduction vs hosted frontier alternatives), high-context-window RAG over corpus documents (256K context), multimodal document analysis via MoonViT, large-scale agent-swarm coordination (up to 300 sub-agents), and any open-weight deployment where frontier-class quality + MIT license matter (sovereign clouds, regulated industries).

Recent developments

Latest signals

Kimi K2.6 released April 20, 2026 — 1T MoE, 256K context, MoonViT vision encoder, 300-agent swarms. Per Moonshot AI Releases K2.6 (NYU Shanghai RITS) and the official GitHub.
SWE-Bench Pro: 58.6 — leads GPT-5.4 (57.7), Claude Opus 4.6 max-effort (53.4), Gemini 3.1 Pro (54.2). Per Miraflow analysis.
HLE-Full (with tools): 54.0 — leads GPT-5.4 (52.1), Claude Opus 4.6 (53.0), Gemini 3.1 Pro (51.4). Per Miraflow analysis.
Coding-cost savings up to 88% vs hosted frontier APIs. When deployed as the Kimi Code Review service, total coding-task spend drops dramatically against equivalent Claude/GPT calls. Per Medium — Ewan Mak review.
Technical deep-dive on the 384-expert MoE routing. IntuitionLabs published a detailed walk-through of K2's MoE layer structure, MLA implementation, and 61-layer composition. Per IntuitionLabs deep dive.

Connections 8

Outbound 4

scoped_to1

AI Memory Infrastructure

depends_on2

Multi-Head Latent Attention (MLA)Mooncake

alternative_to1

DeepSeek V3

Inbound 4

enables2

Multi-Head Latent Attention (MLA)DeepSeekMoE

competes_with2

Llama 4 Qwen3