#kv-cache - Tags - ML Learning Lab

7 posts · Transformer Series

Tag: #kv-cache

Exploring the Transformer Series (28) --- DeepSeek MLA

🗓 2026-04-09 • Transformer Series • ⏱ 55 min read

DeepSeek MLA: low-rank KV compression, weight absorption, decoupled RoPE, resource tradeoffs, implementation details, and conversions from GQA and MHA.

#transformer #mla #deepseek #attention #kv-cache #rope

Read →

Exploring the Transformer Series (25) --- KV Cache Optimization for Handling Long Text Sequences

🗓 2026-04-09 • Transformer Series • ⏱ 105 min read

KV cache optimization for long text sequences: sparsification, token reuse, prefix reuse, retrieval-based schemes, and long-context KV management.

#transformer #kv-cache #optimization #long-context #inference #sparsification

Read →

Exploring the Transformer Series (26) --- KV Cache Optimization: PD Separation or Merging

🗓 2026-04-09 • Transformer Series • ⏱ 104 min read

KV cache optimization through PD separation or merging: static batching, ORCA, Sarathi, DistServe, SplitWise, MemServe, TetriInfer, and Mooncake.

#transformer #kv-cache #prefill #decode #parallelism #inference

Read →

Exploring the Transformer Series (24) --- KV Cache Optimization

🗓 2026-04-09 • Transformer Series • ⏱ 79 min read

KV Cache optimization: metrics, memory crisis, formula-based compression, stage-aware optimization, memory management, and scheduling.

#transformer #kv-cache #optimization #inference #memory #prefill

Read →

Exploring the Transformer Series (27) --- MQA & GQA

🗓 2026-04-09 • Transformer Series • ⏱ 13 min read

MQA and GQA: MHA review, shared KV heads, grouped-query attention, implementation details, memory and speed tradeoffs, conversion, and optimization variants.

#transformer #mqa #gqa #attention #kv-cache #mha

Read →

Exploring the Transformer Series (20) --- KV Cache

🗓 2026-04-07 • Transformer Series • ⏱ 50 min read

Autoregressive inference redundancy, KV cache, prefill vs decode, implementation, and resource usage.

#transformer #kv-cache #inference #prefill #decode #memory

Read →

Exploring the Transformer Series (16) --- Resource Consumption

🗓 2026-04-05 • Transformer Series • ⏱ 35 min read

Transformer parameter counts, memory usage, activations, FLOPs, KV cache, and optimization directions.

#transformer #parameters #memory #activations #flops #kv-cache

Read →

| #kv-cache

Tag: #kv-cache

Exploring the Transformer Series (28) --- DeepSeek MLA

Exploring the Transformer Series (25) --- KV Cache Optimization for Handling Long Text Sequences

Exploring the Transformer Series (26) --- KV Cache Optimization: PD Separation or Merging

Exploring the Transformer Series (24) --- KV Cache Optimization

Exploring the Transformer Series (27) --- MQA & GQA

Exploring the Transformer Series (20) --- KV Cache

Exploring the Transformer Series (16) --- Resource Consumption