Transformer Systems · Skill Path

LLM Inference Optimization

A focused path for understanding KV cache mechanics, shared-KV attention variants, efficient attention, and KV cache serving optimization.

Transformer Systems 4 steps advanced to expert 8.5 hr 0 review decks

01
KV Cache Mechanics Lesson · Inference And Serving Mechanics · 2 hr · Reading
02
MQA And GQA Lesson · Inference And Serving Mechanics · 1.5 hr · Reading
03
FlashAttention V1 Lesson · Efficient Attention And KV Cache · 3 hr · Reading
04
KV Cache Optimization Lesson · Efficient Attention And KV Cache · 2 hr · Reading