Exploring the Transformer Series (30) --- Decoding Speculation
Speculative decoding, speculative sampling, blockwise parallel decoding, token tree verification, and Hugging Face implementation details.
Speculative decoding, speculative sampling, blockwise parallel decoding, token tree verification, and Hugging Face implementation details.
Transformer generator heads, softmax, decoding strategies, sampling parameters, logits analysis, and weight sharing.