Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction
Yuzhe Zhang, Xianwei Xue, Xingyong Wu, Mengke Chen, Chen Liu, Xinran He, Run Shao, Feiran Liu, Huanmin Xu, Qiutong Pan, Haiwei Wang
- 🏛 Institutions
- Unknown
- 📅 Date
- April 7, 2026
- 📑 Publisher
- ACL 2026
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
VeriGUI treats action-effect verification as a first-class RL objective to handle non-deterministic GUI environments with network delays, rendering lags, and system failures. A Thinking-Verification-Action-Expectation framework identifies failures; two-phase training with Robust SFT and GRPO using asymmetric verification rewards reduces failure loops. A new Robustness Benchmark built on AndroidControl evaluates failure recognition and correction.
Related papers
- Generalization in Online Reinforcement Learning for Mobile AgentsMarch 8, 2026 · arXiv
- AgentCPM‑GUI: Building Mobile‑Use Agents with Reinforcement Fine‑TuningJune 2, 2025 · EMNLP 2025 System Demonstrations
- UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement LearningMarch 27, 2025 · arXiv
- GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI AgentsApril 14, 2025 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- CGL: Advancing Continual GUI Learning via Reinforcement Fine-TuningMarch 3, 2026 · arXiv