Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
-
Updated
Mar 17, 2026 - Python
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
vMLX - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth
[MLSys-26] FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management
Add a description, image, and links to the kvcache-optimization topic page so that developers can more easily learn about it.
To associate your repository with the kvcache-optimization topic, visit your repo's landing page and select "manage topics."