Exploring the Transformer Series (27) --- MQA & GQA
MQA and GQA: MHA review, shared KV heads, grouped-query attention, implementation details, memory and speed tradeoffs, conversion, and optimization variants.
MQA and GQA: MHA review, shared KV heads, grouped-query attention, implementation details, memory and speed tradeoffs, conversion, and optimization variants.
Multi-head self-attention in Transformers: motivation, principles, implementation details, and modern head-composition improvements.