GUI Agents Papers
Star · 751

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong

🏛 Institutions
HKU, Salesforce AI Research
📅 Date
December 5, 2024
📑 Publisher
ICML 2025 (Poster)
💻 Env
General GUI
🔑 Keywords
TLDR

Aguvis is a pure-vision GUI agent that removes textual interface representations and operates directly on screen images. It combines a large grounding-and-reasoning dataset with a two-stage training pipeline and inner-monologue reasoning, reporting strong offline and online performance without relying on closed-source models.

Open paper Edit on GitHub Report issue
Related papers