#parallelism - Tags - ML Learning Lab

4 posts · Transformer Series

Tag: #parallelism

🗓 2026-04-09 • Transformer Series • ⏱ 104 min read

KV cache optimization through PD separation or merging: static batching, ORCA, Sarathi, DistServe, SplitWise, MemServe, TetriInfer, and Mooncake.

🗓 2026-04-07 • Transformer Series • ⏱ 87 min read

FlashAttention, online softmax, tiling, IO-awareness, and memory-efficient exact attention.

🗓 2026-04-07 • Transformer Series • ⏱ 79 min read

Mixture-of-Experts (MoE): conditional computation, routing, experts, load balancing, implementation, and parallel inference.

🗓 2026-04-01 • Transformer Series • ⏱ 38 min read

Transformer training and inference in practice: teacher forcing, masks, dropout, label smoothing, learning rate scheduling, and parallelism.