Exploring the Transformer Series (28) --- DeepSeek MLA
DeepSeek MLA: low-rank KV compression, weight absorption, decoupled RoPE, resource tradeoffs, implementation details, and conversions from GQA and MHA.
DeepSeek MLA: low-rank KV compression, weight absorption, decoupled RoPE, resource tradeoffs, implementation details, and conversions from GQA and MHA.
LoRA: PEFT, low-rank adaptation, rank, initialization, implementation, optimization, and continual learning.