Exploring the Transformer Series (27) --- MQA & GQA
MQA and GQA: MHA review, shared KV heads, grouped-query attention, implementation details, memory and speed tradeoffs, conversion, and optimization variants.
MQA and GQA: MHA review, shared KV heads, grouped-query attention, implementation details, memory and speed tradeoffs, conversion, and optimization variants.