From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation
Zezhou Wang , Ziyun Zhang , Xiaoyi Zhang , Zhuzhong Qian , Yan Lu
- 🏛 Institutions
- NJU , PKU , MSR Asia
- 📅 Date
- January 9, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
BEPA improves end-to-end GUI-agent training with verifiable rewards by turning scarce off-policy expert traces into policy-aligned guidance through self-rolled reachable trajectories and a dynamically updated per-task cache. On OSWorld-Verified it raises UI-TARS-1.5-7B from 22.87% to 32.13%, with additional gains on MMBench-GUI and Online-Mind2Web.
Related papers (24)
- Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data CurationSeptember 28, 2025 · arXiv
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use AgentsAugust 19, 2025 · ICLR 2026 (Poster)
- GUI-C²: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement LearningMay 29, 2026 · arXiv
- LiteGUI: Distilling Compact GUI Agents with Reinforcement LearningMay 8, 2026 · arXiv
- OS-Themis: A Scalable Critic Framework for Generalist GUI RewardsMarch 19, 2026 · arXiv
- CGL: Advancing Continual GUI Learning via Reinforcement Fine-TuningMarch 3, 2026 · arXiv
- GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RLFebruary 25, 2026 · arXiv
- Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy OptimizationFebruary 14, 2026 · arXiv
- Autonomous Continual Learning of Computer-Use Agents for Environment AdaptationFebruary 10, 2026 · arXiv
- Agent Alpha: Tree Search Unifying Generation, Exploration and Evaluation for Computer-Use AgentsFebruary 3, 2026 · arXiv
- Agentic Reward Modeling: Verifying GUI Agent via Online Proactive InteractionJanuary 31, 2026 · arXiv
- SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationJanuary 30, 2026 · arXiv
- BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI AgentsJanuary 29, 2026 · arXiv
- GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI AgentsJanuary 14, 2026 · arXiv
- GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement LearningDecember 2, 2025 · arXiv
- HiconAgent: History Context-aware Policy Optimization for GUI AgentsDecember 1, 2025 · arXiv
- Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI AutomationNovember 27, 2025 · CVPR 2026
- R-WoM: Retrieval-augmented World Model For Computer-use AgentsOctober 13, 2025 · ICLR 2026 (Poster)
- Just Do It!? Computer-Use Agents Exhibit Blind Goal-DirectednessOctober 2, 2025 · ICLR 2026 (Poster)
- UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time GroundingJuly 29, 2025 · CVPR 2026 Findings
- ProgRM: Build Better GUI Agents with Progress RewardsMay 23, 2025 · arXiv
- GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI AgentsMay 21, 2025 · NeurIPS 2025 (Poster)
- Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement LearningMay 18, 2025 · NeurIPS 2025 (Poster)
- A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement LearningApril 29, 2025 · arXiv