Exploring the Transformer Series (30) --- Decoding Speculation
Speculative decoding, speculative sampling, blockwise parallel decoding, token tree verification, and Hugging Face implementation details.
Speculative decoding, speculative sampling, blockwise parallel decoding, token tree verification, and Hugging Face implementation details.
Lookahead decoding: Jacobi decoding, n-gram pool, 2D window, parallel verification, and llama.cpp implementation details.