Prefill Tax
The compute cost required to process the input sequence before an LLM can generate the first output token. As prompts grow to hundreds of thousands or millions of tokens, the prefill phase dominates inference latency and cost — generating one token of output requires re-running attention over the entire input. The "tax" framing reflects that this work is non-optional and grows superlinearly in prompt length even for relatively short responses.
Definition
The compute cost required to process the input sequence before an LLM can generate the first output token. As prompts grow to hundreds of thousands or millions of tokens, the prefill phase dominates inference latency and cost — generating one token of output requires re-running attention over the entire input. The "tax" framing reflects that this work is non-optional and grows superlinearly in prompt length even for relatively short responses.