#llama3 - Tags - ML Learning Lab

3 posts · Transformer Series

Tag: #llama3

Exploring the Transformer Series (27) --- MQA & GQA

🗓 2026-04-09 • Transformer Series • ⏱ 13 min read

MQA and GQA: MHA review, shared KV heads, grouped-query attention, implementation details, memory and speed tradeoffs, conversion, and optimization variants.

#transformer #mqa #gqa #attention #kv-cache #mha

Read →

Exploring the Transformer Series (17) --- RoPE

🗓 2026-04-05 • Transformer Series • ⏱ 47 min read

RoPE positional encoding, derivation, properties, extrapolation, and implementation.

#transformer #rope #position-encoding #rotary-embedding #llm #attention

Read →

Exploring the Transformer Series (10) --- Self-Attention

🗓 2026-04-02 • Transformer Series • ⏱ 86 min read

Self-attention in Transformers: principles, implementation details, scaling/softmax analysis, and modern optimization directions.

#transformer #self-attention #qkv #softmax #llama3 #linear-attention

Read →

| #llama3

Tag: #llama3

Exploring the Transformer Series (27) --- MQA & GQA

Exploring the Transformer Series (17) --- RoPE

Exploring the Transformer Series (10) --- Self-Attention