STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

Yuhang Han , Wenzheng Yang , Yujie Chen , Xiangqi Jin , Yaojie Zhang , Siteng Huang , Linfeng Zhang

🏛 Institutions: SJTU , HKUST (GZ) , ZJU
📅 Date: June 1, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: efficiency KV cache compression vision-language model training-free STaR-KV

TLDR

STaR-KV is a training-free KV cache compression framework for GUI vision-language models, whose cache grows linearly with interaction steps. It combines subspace-aware spatial scoring with temporal-stability re-weighting to retain salient tokens, reducing memory while preserving accuracy across GUI agent benchmarks.

Open paper arXiv Report issue