GUI Agents Papers
Star · 821

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Shravan Nayak , Xiangru Jian , Kevin Qinghong Lin , Juan A. Rodriguez , Montek Kalsi , Rabiul Awal , Nicolas Chapados , M. Tamer Özsu , Aishwarya Agrawal , David Vazquez , Christopher Pal , Perouz Taslakian , Spandana Gella , Sai Rajeswar

🏛 Institutions
Mila , Université de Montréal , ServiceNow , University of Waterloo , NUS , École de Technologie Supérieure , Polytechnique Montréal
📅 Date
March 19, 2025
📑 Publisher
ICML 2025 (Poster)
💻 Env
Desktop
🔑 Keywords
TLDR

UI-Vision is a desktop GUI benchmark with dense human-demonstration annotations over 83 applications, covering element grounding, layout grounding, and action prediction. It exposes persistent weaknesses of current agents on professional software, spatial reasoning, and actions such as drag-and-drop, while providing an open benchmark for desktop-centric GUI evaluation.

Open paper arXiv Report issue
Related papers (24)