What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

Songze Li , Xiaoke Guo , Tianqi Liu , Biao Yi , Zhaoyan Gong , Zhiqiang Liu , Huajun Chen , Wen Zhang

🏛 Institutions: ZJU
📅 Date: April 8, 2026
📑 Publisher: Findings of ACL 2026
💻 Env: General GUI
🔑 Keywords: GUI grounding benchmark UILoop UI comprehension

TLDR

UILoop treats GUI reasoning as a cyclic Screen-UI elements-Action process, enabling MLLMs to explicitly learn the localization, semantic functions, and usage of key UI elements. It introduces UI Comprehension-Bench (26K samples) and achieves state-of-the-art GUI reasoning performance with improved interpretability.

Open paper arXiv Report issue