Technology

Multi-Token Prediction (MTP)

Predict N+1, N+2 tokens simultaneously for denser gradients.

1 connections

Definition

What it is

Predict N+1, N+2 tokens simultaneously for denser gradients.

Recent developments

Latest signals
  • Multi-Token Prediction objective impact on pre-training efficiency in small-scale language models. Analysis of MTP on 1B parameter models; NTP showed higher overall average performance than MTP on benchmark suite (ARC, BoolQ, HSWG, OBQA, PIQA, RACE, T/FQA). Per arxiv.org.
  • Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing. Training-free MTP approach using mask tokens from embedding space for parallel prediction without modifying model weights or auxiliary draft models. Per arxiv.org (2026-03-18).
  • Token-level Collaborative Alignment for LLM-based Generative Recommendation. TCA4Rec framework aligns collaborative filtering signals with token-level next-token prediction in LLM recommender systems. Per arxiv.org (2026-01-26).

Connections 1

Outbound 1