GUI Agents Papers
Star · 751

From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation

Zezhou Wang, Ziyun Zhang, Xiaoyi Zhang, Zhuzhong Qian, Yan Lu

🏛 Institutions
NJU, PKU, MSR Asia
📅 Date
January 9, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

BEPA improves end-to-end GUI-agent training with verifiable rewards by turning scarce off-policy expert traces into policy-aligned guidance through self-rolled reachable trajectories and a dynamically updated per-task cache. On OSWorld-Verified it raises UI-TARS-1.5-7B from 22.87% to 32.13%, with additional gains on MMBench-GUI and Online-Mind2Web.

Open paper arXiv Edit on GitHub Report issue
Related papers