Transformer Systems · Course

Transformer Systems: From Tokens To Efficient LLM Serving

A systems-focused transformer course that starts with text-to-tensors, builds attention and training mechanics, then moves into LLM serving, KV cache, decoding, MoE, adaptation, and quantization.

Transformer Systems 7 modules 36 lessons beginner to expert 71.4 hr

Foundations And Data Flow

Text, tokens, embeddings, data loaders, and the high-level Transformer map.

5 lessons

01
Exploring the Transformer Series (1): Attention Mechanism intermediate · 1.5 hr · Reading
02
Exploring the Transformer Series (6) --- token intermediate · 1.7 hr · Reading
03
Exploring the Transformer Series (7) --- Embedding intermediate · 1.5 hr · Reading
04
Exploring the Transformer Series (3) --- Data Processing intermediate · 1.3 hr · Reading
05
Exploring the Transformer Series (2) --- Overall Architecture intermediate · 2 hr · Reading

Attention And Positional Information

Position encodings, RoPE, self-attention, masks, and multi-head attention.

6 lessons

Transformer Blocks And Training

Encoder/decoder blocks, training mechanics, FFNs, normalization, sampling, and cost accounting.

6 lessons

Inference And Serving Mechanics

KV cache behavior, MQA/GQA tensor shapes, and long-context extrapolation.

3 lessons

Efficient Attention And KV Cache

FlashAttention, KV cache optimization, long-context reuse, and prefill/decode scheduling.

5 lessons

MoE, Adaptation, And Compression

Mixture-of-experts systems, LoRA, quantization foundations, diagnostics, and schemes.

5 lessons

Advanced Decoding And DeepSeek Systems

DeepSeek MLA/MoE/MTP and advanced decoding methods including speculation, Medusa, and lookahead decoding.

6 lessons