GUI Agents Papers
Star · 751

SpiritSight Agent: Advanced GUI Agent with One Look

Zhiyuan Huang, Ziming Cheng, Junting Pan, Zhaohui Hou, Mingjie Zhan

🏛 Institutions
SenseTime Research, Beijing University of Posts and Telecommunications, MMLab, CUHK
📅 Date
March 5, 2025
📑 Publisher
CVPR 2025 (Poster)
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

SpiritSight is an end-to-end GUI agent designed to act from a single screenshot while retaining strong cross-platform grounding accuracy. The paper pairs the GUI-Lasagne dataset with Universal Block Parsing to reduce dynamic-resolution ambiguity and reports gains across web, mobile, and desktop benchmarks.

Open paper arXiv Edit on GitHub Report issue
Related papers