GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models
Shaokang Wang, Pei Fu, Ruoceng Zhang, Shaojie Zhang, Xiuwen Xi, Jiahui Yang, Bin Qin, Ying Huang, Zhenbo Luo, Jian Luan
- 🏛 Institutions
- MiLM Plus, Xiaomi
- 📅 Date
- January 26, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
GAIA trains an Intuitive Critic Model that judges the immediate correctness of candidate GUI actions before execution. It then uses a data flywheel that recycles agent-generated positive and negative action samples to iteratively improve the critic, yielding better test-time performance for both open-source and closed-source GUI agents.
Related papers
- Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI GroundingDecember 5, 2025 · arXiv
- UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement LearningSeptember 2, 2025 · arXiv
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- Agentic Test-Time Scaling for WebAgentsFebruary 12, 2026 · arXiv
- OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic ModelsDecember 18, 2025 · arXiv
- Scaling Agents for Computer UseOctober 2, 2025 · arXiv