GUI Agents Papers
Star · 821

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Yiheng Xu , Zekun Wang , Junli Wang , Dunjie Lu , Tianbao Xie , Amrita Saha , Doyen Sahoo , Tao Yu , Caiming Xiong

🏛 Institutions
HKU , Salesforce AI Research
📅 Date
December 5, 2024
📑 Publisher
ICML 2025 (Poster)
💻 Env
General GUI
🔑 Keywords
TLDR

Aguvis is a pure-vision GUI agent that removes textual interface representations and operates directly on screen images. It combines a large grounding-and-reasoning dataset with a two-stage training pipeline and inner-monologue reasoning, reporting strong offline and online performance without relying on closed-source models.

Open paper Report issue
Related papers (24)