GUI Agents Papers
Star · 821

SpiritSight Agent: Advanced GUI Agent with One Look

Zhiyuan Huang , Ziming Cheng , Junting Pan , Zhaohui Hou , Mingjie Zhan

🏛 Institutions
SenseTime Research , Beijing University of Posts and Telecommunications , MMLab , CUHK
📅 Date
March 5, 2025
📑 Publisher
CVPR 2025 (Poster)
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

SpiritSight is an end-to-end GUI agent designed to act from a single screenshot while retaining strong cross-platform grounding accuracy. The paper pairs the GUI-Lasagne dataset with Universal Block Parsing to reduce dynamic-resolution ambiguity and reports gains across web, mobile, and desktop benchmarks.

Open paper arXiv Report issue
Related papers (24)