Transformer Systems · Course
Transformer Systems: From Tokens To Efficient LLM Serving
A systems-focused transformer course that starts with text-to-tensors, builds attention and training mechanics, then moves into LLM serving, KV cache, decoding, MoE, adaptation, and quantization.
Foundations And Data Flow
Text, tokens, embeddings, data loaders, and the high-level Transformer map.
- 01 Exploring the Transformer Series (1): Attention Mechanism intermediate · 1.5 hr · Reading
- 02 Exploring the Transformer Series (6) --- token intermediate · 1.7 hr · Reading
- 03 Exploring the Transformer Series (7) --- Embedding intermediate · 1.5 hr · Reading
- 04 Exploring the Transformer Series (3) --- Data Processing intermediate · 1.3 hr · Reading
- 05 Exploring the Transformer Series (2) --- Overall Architecture intermediate · 2 hr · Reading
Attention And Positional Information
Position encodings, RoPE, self-attention, masks, and multi-head attention.
- 01 Exploring the Transformer Series (8) --- Position Encoding advanced · 2 hr · Reading
- 02 Exploring the Transformer Series (17) --- RoPE advanced · 2 hr · Reading
- 03 Exploring the Transformer Series (9) --- Location Encoding Classification advanced · 1.5 hr · Reading
- 04 Exploring the Transformer Series (10) --- Self-Attention advanced · 2.5 hr · Reading
- 05 Exploring the Transformer Series (11) --- Mask advanced · 1.7 hr · Review deckFlashcard deck
- 06 Exploring the Transformer Series (12) --- Multi-head Self-Attention intermediate · 1.5 hr · Reading
Transformer Blocks And Training
Encoder/decoder blocks, training mechanics, FFNs, normalization, sampling, and cost accounting.
- 01 Exploring the Transformer Series (4) --- Encoder & Decoder intermediate · 1.5 hr · Reading
- 02 Exploring the Transformer Series (5) --- Training & Reasoning advanced · 2 hr · Reading
- 03 Exploring the Transformer Series (13) --- FFN advanced · 2 hr · Reading
- 04 Exploring the Transformer Series (14) --- Residual Networks and Normalization advanced · 1.8 hr · Reading
- 05 Exploring the Transformer Series (15) --- Sampling and Output intermediate · 1.5 hr · Reading
- 06 Exploring the Transformer Series (16) --- Resource Consumption advanced · 2 hr · Reading
Inference And Serving Mechanics
KV cache behavior, MQA/GQA tensor shapes, and long-context extrapolation.
Efficient Attention And KV Cache
FlashAttention, KV cache optimization, long-context reuse, and prefill/decode scheduling.
- 01 Exploring the Transformer Series (18) --- FlashAttention expert · 3 hr · Reading
- 02 Exploring the Transformer Series (19) --- FlashAttention V2 and its Upgrade expert · 2 hr · Reading
- 03 Exploring the Transformer Series (24) --- KV Cache Optimization advanced · 2 hr · Reading
- 04 Exploring the Transformer Series (25) --- KV Cache Optimization for Handling Long Text Sequences expert · 3 hr · Reading
- 05 Exploring the Transformer Series (26) --- KV Cache Optimization: PD Separation or Merging expert · 2 hr · Reading
MoE, Adaptation, And Compression
Mixture-of-experts systems, LoRA, quantization foundations, diagnostics, and schemes.
- 01 Exploring the Transformer Series (21) --- MoE expert · 3 hr · Reading
- 02 Exploring the Transformer Series (22) --- LoRA advanced · 2.5 hr · Reading
- 03 Exploring the Transformer Series (34) --- Quantitative Fundamentals intermediate · 1.5 hr · Reading
- 04 Exploring the Transformer Series (35) --- Fundamentals of Large Model Quantization advanced · 2 hr · Reading
- 05 Exploring the Transformer Series (36) --- Large Model Quantization Scheme expert · 3 hr · Reading
Advanced Decoding And DeepSeek Systems
DeepSeek MLA/MoE/MTP and advanced decoding methods including speculation, Medusa, and lookahead decoding.
- 01 Exploring the Transformer Series (28) --- DeepSeek MLA expert · 2.5 hr · Reading
- 02 Exploring the Transformer Series (29) --- DeepSeek MoE expert · 2.5 hr · Reading
- 03 Exploring the Transformer Series (30) --- Decoding Speculation expert · 2 hr · Reading
- 04 Exploring the Transformer Series (31) --- Medusa expert · 2 hr · Reading
- 05 Exploring the Transformer Series (32) --- Lookahead Decoding advanced · 1.5 hr · Reading
- 06 Exploring the Transformer Series (33) --- DeepSeek MTP expert · 2 hr · Reading