OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu
- 🏛 Institutions
- Shanghai AI Laboratory, HKU, JHU, SJTU, Oxford, HKUST
- 📅 Date
- December 27, 2024
- 📑 Publisher
- ACL 2025
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
OS-Genesis tackles the lack of high-quality GUI trajectories by synthesizing them without preset tasks or human demonstrations. It first explores with step-level interactions, then retrospectively derives tasks and filters the resulting trajectories with a reward model, producing more diverse training data for GUI agents.
Related papers
- Video-Based Reward Modeling for Computer-Use AgentsMarch 10, 2026 · arXiv
- Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web AgentsJuly 2025 · Findings of ACL 2025
- UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsMay 27, 2025 · NeurIPS 2025 (Poster)
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- OS-Themis: A Scalable Critic Framework for Generalist GUI RewardsMarch 19, 2026 · arXiv
- Moving Beyond Sparse Grounding with Complete Screen Parsing SupervisionFebruary 15, 2026 · arXiv