Exploring the Transformer Series (18) --- FlashAttention
FlashAttention, online softmax, tiling, IO-awareness, and memory-efficient exact attention.
FlashAttention, online softmax, tiling, IO-awareness, and memory-efficient exact attention.
Transformer generator heads, softmax, decoding strategies, sampling parameters, logits analysis, and weight sharing.
Self-attention in Transformers: principles, implementation details, scaling/softmax analysis, and modern optimization directions.