Model Class

Llama 4

Meta's open-weight LLM family, released April 5, 2025 — the first Llama models to use Mixture-of-Experts (MoE) architecture and the first natively multimodal Llama. Two production variants shipped publicly: - **Llama 4 Scout** — 109B total / 17B active / 16 experts / **10M token context window** (industry-leading) - **Llama 4 Maverick** — ~400B total / 17B active / 128 experts / 1M token context - **Llama 4 Behemoth** (~2T total / 288B active) was previewed but never publicly released. Architecture alternates dense and MoE layers for inference efficiency; MoE layers use 128 routed experts + 1 shared expert, with each token sent to the shared expert plus one routed expert.

4 connections

Definition

What it is

Meta's open-weight LLM family, released April 5, 2025 — the first Llama models to use Mixture-of-Experts (MoE) architecture and the first natively multimodal Llama. Two production variants shipped publicly: - **Llama 4 Scout** — 109B total / 17B active / 16 experts / **10M token context window** (industry-leading) - **Llama 4 Maverick** — ~400B total / 17B active / 128 experts / 1M token context - **Llama 4 Behemoth** (~2T total / 288B active) was previewed but never publicly released. Architecture alternates dense and MoE layers for inference efficiency; MoE layers use 128 routed experts + 1 shared expert, with each token sent to the shared expert plus one routed expert.

Why it exists

Llama 3 was dense and capped at 128K context — by mid-2024 the cost-per-token gap between dense Llama and sparse MoE alternatives (DeepSeek, Qwen, Kimi) had widened to the point that dense Llama was uncompetitive for production inference. Llama 4 is Meta's bet that the open-weight ecosystem they anchor needs MoE + multimodal + extreme-long-context to stay relevant. Scout's 10M-token context is the strategic differentiator: the only open-weight model where you can fit an entire codebase or book corpus into the working window without external retrieval.

Primary use cases

Million-to-10M-token-context analysis (entire codebases, full books, multi-document corpora) via Scout, multimodal agentic workloads via Maverick, hosted via Oracle GenAI / Together AI / Fireworks for production serving, derivative model training (Llama 4 weights are the base for many fine-tunes), and any workload where Meta's permissive license + size flexibility matters.

Recent developments

Latest signals
  • Llama 4 Scout: 10M-token context — industry-leading. Compared to Llama 3's 128K, Scout's 10M token context is the open-weight ceiling for single-shot long-context inference. Per Meta AI blog.
  • Maverick: best-in-class multimodal — exceeds GPT-4o and Gemini 2.0 on coding/reasoning/multilingual/long-context/image. Per Meta's released benchmarks, Maverick is competitive with the much larger DeepSeek V3.1 on coding and reasoning. Per llama.com/models/llama-4.
  • Behemoth previewed but never released. The ~2T-total / 288B-active variant was previewed but withheld from public release — likely held back due to competitive pressure from DeepSeek V4 / Qwen 3.6 / Kimi K2.6. Per Wikipedia (Llama LLM).
  • Behind DeepSeek V4 and Qwen 3.5/3.6 on coding and hard reasoning per community testing. Public benchmarks and community testing in 2026 put Maverick behind the DeepSeek V4 + Qwen 3.5/3.6 frontier on coding/reasoning specifically, even while leading on multimodal. Per DeepLearning.AI — The Batch.
  • Available via Oracle GenAI catalog for enterprise procurement. Oracle ships Maverick in the OCI Generative AI service. Per Oracle docs.

Connections 4

Outbound 3
Inbound 1
competes_with1