GUI Agents Papers
Star · 751

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

Yuhang Liu, Pengxiang Li, Congkai Xie, Xavier Hu, Xiaotian Han, Shengyu Zhang, Hongxia Yang, Fei Wu

🏛 Institutions
ZJU, Dalian University of Technology, Reallm Labs, PolyU
📅 Date
April 19, 2025
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

InfiGUI-R1 is trained to shift GUI agents from reactive action prediction toward explicit deliberative reasoning. Its Actor2Reasoner pipeline first distills cross-modal spatial reasoning into the model, then uses reinforcement learning with sub-goal guidance and failure-recovery scenarios to strengthen planning and recovery.

Open paper arXiv Edit on GitHub Report issue
Related papers