Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning

Qingyuan Wu , Jianheng Liu , Jianye Hao , Jun Wang , Kun Shao

🏛 Institutions: University of Liverpool , University of Southampton , Huawei Noah's Ark Lab , Tianjin University , UCL
📅 Date: February 11, 2025
📑 Publisher: arXiv
💻 Env: Mobile Web
🔑 Keywords: reinforcement learning subgoal-conditioned RL SGC-ELBO learning efficiency VSC-RL

TLDR

This paper reformulates long-horizon VLM-agent training as a variational subgoal-conditioned reinforcement learning problem with the SGC-ELBO objective. Across mobile-device and web-control benchmarks, VSC-RL improves both learning efficiency and final performance over prior RL methods.

Open paper arXiv Report issue

Related papers (24)

GUI-R1: A Generalist R1-Style Vision-Language Action Model for GUI Agents

April 14, 2025 · arXiv
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research

May 25, 2026 · arXiv
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

April 13, 2026 · arXiv
Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

April 8, 2026 · arXiv
Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction

April 7, 2026 · ACL 2026
WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at Scale

March 2026 · Blog Post
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

March 25, 2026 · arXiv
Generalization in Online Reinforcement Learning for Mobile Agents

March 8, 2026 · arXiv
WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

March 5, 2026 · arXiv
OpAgent: Operator Agent for Web Navigation

February 14, 2026 · arXiv
Adaptive Milestone Reward for GUI Agents

February 12, 2026 · arXiv
UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents

February 5, 2026 · arXiv
WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

January 5, 2026 · arXiv
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

December 26, 2025 · arXiv
WebServ: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at Scale

October 17, 2025 · arXiv
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

September 28, 2025 · arXiv
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

June 9, 2025 · SEA @ NeurIPS 2025 (Oral)
AgentCPM‑GUI: Building Mobile‑Use Agents with Reinforcement Fine‑Tuning

June 2, 2025 · EMNLP 2025 System Demonstrations
ZeroGUI: Automating Online GUI Learning at Zero Human Cost

May 29, 2025 · arXiv
WebDancer: Towards Autonomous Information Seeking Agency

May 28, 2025 · NeurIPS 2025 (Poster)
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

May 22, 2025 · EMNLP 2025 (Poster)
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning

May 18, 2025 · ICLR 2026 (Poster)
UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

March 27, 2025 · arXiv
Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

December 17, 2024 · ICML 2025 (Poster)