Exploring the Transformer Series (24) --- KV Cache Optimization
KV Cache optimization: metrics, memory crisis, formula-based compression, stage-aware optimization, memory management, and scheduling.
KV Cache optimization: metrics, memory crisis, formula-based compression, stage-aware optimization, memory management, and scheduling.
FlashAttention, online softmax, tiling, IO-awareness, and memory-efficient exact attention.
Autoregressive inference redundancy, KV cache, prefill vs decode, implementation, and resource usage.
Transformer parameter counts, memory usage, activations, FLOPs, KV cache, and optimization directions.
OpenHands memory internals: layered memory architecture, View/ConversationMemory/Condenser workflow, and implementation details.