GUI Agents Papers
Star · 751

What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

Songze Li, Xiaoke Guo, Tianqi Liu, Biao Yi, Zhaoyan Gong, Zhiqiang Liu, Huajun Chen, Wen Zhang

🏛 Institutions
ZJU
📅 Date
April 8, 2026
📑 Publisher
Findings of ACL 2026
💻 Env
General GUI
🔑 Keywords
TLDR

UILoop treats GUI reasoning as a cyclic Screen-UI elements-Action process, enabling MLLMs to explicitly learn the localization, semantic functions, and usage of key UI elements. It introduces UI Comprehension-Bench (26K samples) and achieves state-of-the-art GUI reasoning performance with improved interpretability.

Open paper arXiv Edit on GitHub Report issue
Related papers