#mha - Tags - ML Learning Lab

2 posts · Transformer Series

Tag: #mha

Exploring the Transformer Series (27) --- MQA & GQA

🗓 2026-04-09 • Transformer Series • ⏱ 13 min read

MQA and GQA: MHA review, shared KV heads, grouped-query attention, implementation details, memory and speed tradeoffs, conversion, and optimization variants.

#transformer #mqa #gqa #attention #kv-cache #mha

Read →

Exploring the Transformer Series (12) --- Multi-head Self-Attention

🗓 2026-04-03 • Transformer Series • ⏱ 41 min read

Multi-head self-attention in Transformers: motivation, principles, implementation details, and modern head-composition improvements.

#transformer #multi-head-self-attention #attention #qkv #mha #optimization

Read →

| #mha

Tag: #mha

Exploring the Transformer Series (27) --- MQA & GQA

Exploring the Transformer Series (12) --- Multi-head Self-Attention