V2P: Visual Attention Calibration for GUI Grounding via Background Suppression and Center Peaking

Jikai Chen , Long Chen , Dong Wang , Qinglin Su , Zhixuan Chu , Bingguang Hao , Leilei Gan , Chenyi Zhuang , Jinjie Gu

🏛 Institutions: ZJU , Inclusion AI , Ant Group
📅 Date: January 11, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: GUI grounding visual attention background suppression Fitts' law V2P

TLDR

V2P improves attention-based GUI grounding by suppressing irrelevant background regions and using a Fitts’ Law-inspired Gaussian peak to emphasize an element’s actionable center over its edges. The paper reports 92.4% on ScreenSpot-v2 and 52.5% on ScreenSpot-Pro, with ablations confirming both components matter.

Open paper arXiv Report issue