Exploring the Transformer Series (26) --- KV Cache Optimization: PD Separation or Merging
KV cache optimization through PD separation or merging: static batching, ORCA, Sarathi, DistServe, SplitWise, MemServe, TetriInfer, and Mooncake.
KV cache optimization through PD separation or merging: static batching, ORCA, Sarathi, DistServe, SplitWise, MemServe, TetriInfer, and Mooncake.
KV Cache optimization: metrics, memory crisis, formula-based compression, stage-aware optimization, memory management, and scheduling.
Autoregressive inference redundancy, KV cache, prefill vs decode, implementation, and resource usage.