GUI Agents Papers
Star · 821

DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents

Yibin Xu , Liang Yang , Hao Chen , Hua Wang , Zhi Chen , Yaohua Tang

🏛 Institutions
Moore Threads AI
📅 Date
March 14, 2025
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

DeskVision introduces AutoCaptioner, a pipeline for generating richly described desktop GUI data, then uses it to build a large dataset and the DeskVision-Eval benchmark. The paper also trains GUIExplorer and shows that the added data materially improves desktop element understanding and grounding.

Open paper arXiv Report issue
Related papers (24)