Exploring the Transformer Series (30) --- Decoding Speculation
Speculative decoding, speculative sampling, blockwise parallel decoding, token tree verification, and Hugging Face implementation details.
Speculative decoding, speculative sampling, blockwise parallel decoding, token tree verification, and Hugging Face implementation details.
DeepSeek MTP: EAGLE, HASS, classical multi-token prediction, DeepSeekβs causal-chain design, formulas, and the vLLM implementation.
Lookahead decoding: Jacobi decoding, n-gram pool, 2D window, parallel verification, and llama.cpp implementation details.
Medusa: multi-decoding heads, tree attention, typical acceptance, sparse tree construction, training strategies, and decoding flow.