Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization
Yibo Wang, Guangda Huzhang, Yuwei Hu, Yu Xia, Shiyin Lu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang
- 🏛 Institutions
- National Key Laboratory for Novel Software Technology, NJU, Ovis Team, Alibaba Group
- 📅 Date
- February 14, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This paper trains GUI agents with an agentic-Q model that estimates each action's contribution to task completion and a step-wise policy optimization routine decoupled from online interaction. The design keeps data collection manageable while stabilizing updates and improving navigation and grounding performance.
Related papers
- OS-Themis: A Scalable Critic Framework for Generalist GUI RewardsMarch 19, 2026 · arXiv
- CGL: Advancing Continual GUI Learning via Reinforcement Fine-TuningMarch 3, 2026 · arXiv
- GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RLFebruary 25, 2026 · arXiv
- Autonomous Continual Learning of Computer-Use Agents for Environment AdaptationFebruary 10, 2026 · arXiv
- SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationJanuary 30, 2026 · arXiv
- GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI AgentsJanuary 14, 2026 · arXiv