Transformer Systems · Course

Transformer Systems: From Tokens To Efficient LLM Serving

A systems-focused transformer course that starts with text-to-tensors, builds attention and training mechanics, then moves into LLM serving, KV cache, decoding, MoE, adaptation, and quantization.

Transformer Systems 7 modules 36 lessons beginner to expert 71.4 hr
01

Foundations And Data Flow

Text, tokens, embeddings, data loaders, and the high-level Transformer map.

5 lessons
  1. 01
    Exploring the Transformer Series (1): Attention Mechanism intermediate · 1.5 hr · Reading
  2. 02
    Exploring the Transformer Series (6) --- token intermediate · 1.7 hr · Reading
  3. 03
    Exploring the Transformer Series (7) --- Embedding intermediate · 1.5 hr · Reading
  4. 04
    Exploring the Transformer Series (3) --- Data Processing intermediate · 1.3 hr · Reading
  5. 05
    Exploring the Transformer Series (2) --- Overall Architecture intermediate · 2 hr · Reading
02

Attention And Positional Information

Position encodings, RoPE, self-attention, masks, and multi-head attention.

6 lessons
  1. 01
    Exploring the Transformer Series (8) --- Position Encoding advanced · 2 hr · Reading
  2. 02
    Exploring the Transformer Series (17) --- RoPE advanced · 2 hr · Reading
  3. 03
    Exploring the Transformer Series (9) --- Location Encoding Classification advanced · 1.5 hr · Reading
  4. 04
    Exploring the Transformer Series (10) --- Self-Attention advanced · 2.5 hr · Reading
  5. 05
    Exploring the Transformer Series (11) --- Mask advanced · 1.7 hr · Review deck
    Flashcard deck
  6. 06
    Exploring the Transformer Series (12) --- Multi-head Self-Attention intermediate · 1.5 hr · Reading
03

Transformer Blocks And Training

Encoder/decoder blocks, training mechanics, FFNs, normalization, sampling, and cost accounting.

6 lessons
  1. 01
    Exploring the Transformer Series (4) --- Encoder & Decoder intermediate · 1.5 hr · Reading
  2. 02
    Exploring the Transformer Series (5) --- Training & Reasoning advanced · 2 hr · Reading
  3. 03
    Exploring the Transformer Series (13) --- FFN advanced · 2 hr · Reading
  4. 04
    Exploring the Transformer Series (14) --- Residual Networks and Normalization advanced · 1.8 hr · Reading
  5. 05
    Exploring the Transformer Series (15) --- Sampling and Output intermediate · 1.5 hr · Reading
  6. 06
    Exploring the Transformer Series (16) --- Resource Consumption advanced · 2 hr · Reading
04

Inference And Serving Mechanics

KV cache behavior, MQA/GQA tensor shapes, and long-context extrapolation.

3 lessons
  1. 01
    Exploring the Transformer Series (20) --- KV Cache advanced · 2 hr · Reading
  2. 02
    Exploring the Transformer Series (27) --- MQA & GQA advanced · 1.5 hr · Reading
  3. 03
    Exploring the Transformer Series (23) --- Length Extrapolation advanced · 1.5 hr · Reading
05

Efficient Attention And KV Cache

FlashAttention, KV cache optimization, long-context reuse, and prefill/decode scheduling.

5 lessons
  1. 01
    Exploring the Transformer Series (18) --- FlashAttention expert · 3 hr · Reading
  2. 02
    Exploring the Transformer Series (19) --- FlashAttention V2 and its Upgrade expert · 2 hr · Reading
  3. 03
    Exploring the Transformer Series (24) --- KV Cache Optimization advanced · 2 hr · Reading
  4. 04
    Exploring the Transformer Series (25) --- KV Cache Optimization for Handling Long Text Sequences expert · 3 hr · Reading
  5. 05
    Exploring the Transformer Series (26) --- KV Cache Optimization: PD Separation or Merging expert · 2 hr · Reading
06

MoE, Adaptation, And Compression

Mixture-of-experts systems, LoRA, quantization foundations, diagnostics, and schemes.

5 lessons
  1. 01
    Exploring the Transformer Series (21) --- MoE expert · 3 hr · Reading
  2. 02
    Exploring the Transformer Series (22) --- LoRA advanced · 2.5 hr · Reading
  3. 03
    Exploring the Transformer Series (34) --- Quantitative Fundamentals intermediate · 1.5 hr · Reading
  4. 04
    Exploring the Transformer Series (35) --- Fundamentals of Large Model Quantization advanced · 2 hr · Reading
  5. 05
    Exploring the Transformer Series (36) --- Large Model Quantization Scheme expert · 3 hr · Reading
07

Advanced Decoding And DeepSeek Systems

DeepSeek MLA/MoE/MTP and advanced decoding methods including speculation, Medusa, and lookahead decoding.

6 lessons
  1. 01
    Exploring the Transformer Series (28) --- DeepSeek MLA expert · 2.5 hr · Reading
  2. 02
    Exploring the Transformer Series (29) --- DeepSeek MoE expert · 2.5 hr · Reading
  3. 03
    Exploring the Transformer Series (30) --- Decoding Speculation expert · 2 hr · Reading
  4. 04
    Exploring the Transformer Series (31) --- Medusa expert · 2 hr · Reading
  5. 05
    Exploring the Transformer Series (32) --- Lookahead Decoding advanced · 1.5 hr · Reading
  6. 06
    Exploring the Transformer Series (33) --- DeepSeek MTP expert · 2 hr · Reading