GUI Agents Papers
Star · 751

Ponder & Press: Advancing Visual GUI Agent towards General Computer Control

Yiqin Wang, Haoji Zhang, Jingqi Tian, Yansong Tang

🏛 Institutions
Shenzhen International Graduate School, Tsinghua
📅 Date
December 2, 2024
📑 Publisher
Findings of ACL 2025
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

Ponder & Press is a pure-vision divide-and-conquer GUI-control framework that separates high-level instruction interpretation from element localization. It pairs a general-purpose MLLM interpreter with a GUI-specific locator, improves ScreenSpot grounding by 22.5%, and reports strong performance across web, desktop, and mobile GUI benchmarks.

Open paper Edit on GitHub Report issue
Related papers