Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization

Yibo Wang , Guangda Huzhang , Yuwei Hu , Yu Xia , Shiyin Lu , Qing-Guo Chen , Zhao Xu , Weihua Luo , Kaifu Zhang , Lijun Zhang

🏛 Institutions: National Key Laboratory for Novel Software Technology , NJU , Ovis Team , Alibaba Group
📅 Date: February 14, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: reinforcement learning agentic-Q estimation step-wise policy optimization Ovis2.5-9B grounding

TLDR

This paper trains GUI agents with an agentic-Q model that estimates each action's contribution to task completion and a step-wise policy optimization routine decoupled from online interaction. The design keeps data collection manageable while stabilizing updates and improving navigation and grounding performance.

Open paper arXiv Report issue