OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Qiushi Sun , Kanzhi Cheng , Zichen Ding , Chuanyang Jin , Yian Wang , Fangzhi Xu , Zhenyu Wu , Chengyou Jia , Liheng Chen , Zhoumianze Liu , Ben Kao , Guohao Li , Junxian He , Yu Qiao , Zhiyong Wu
- 🏛 Institutions
- Shanghai AI Laboratory , HKU , JHU , SJTU , Oxford , HKUST
- 📅 Date
- December 27, 2024
- 📑 Publisher
- ACL 2025
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
OS-Genesis tackles the lack of high-quality GUI trajectories by synthesizing them without preset tasks or human demonstrations. It first explores with step-level interactions, then retrospectively derives tasks and filters the resulting trajectories with a reward model, producing more diverse training data for GUI agents.
Related papers (24)
- Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent PretrainingMay 14, 2026 · arXiv
- Video-Based Reward Modeling for Computer-Use AgentsMarch 10, 2026 · arXiv
- Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web AgentsJuly 2025 · Findings of ACL 2025
- UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsMay 27, 2025 · NeurIPS 2025 (Poster)
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- OS-Themis: A Scalable Critic Framework for Generalist GUI RewardsMarch 19, 2026 · arXiv
- Moving Beyond Sparse Grounding with Complete Screen Parsing SupervisionFebruary 15, 2026 · arXiv
- GUIGuard: Toward a General Framework for Privacy-Preserving GUI AgentsJanuary 26, 2026 · arXiv
- MagicGUI-RMS: A Multi-Agent Reward Model System for Self-Evolving GUI Agents via Automated Feedback RefluxJanuary 19, 2026 · arXiv
- Beyond Clicking: A Step Towards Generalist GUI Grounding via Text DraggingNovember 7, 2025 · arXiv
- VideoAgentTrek: Computer Use Pretraining from Unlabeled VideosOctober 22, 2025 · arXiv
- Scaling Synthetic Task Generation for Agents via ExplorationSeptember 29, 2025 · ICLR 2026 (Poster)
- ProgRM: Build Better GUI Agents with Progress RewardsMay 23, 2025 · arXiv
- Scaling Computer‑Use Grounding via User Interface Decomposition and SynthesisMay 19, 2025 · NeurIPS 2025 Datasets and Benchmarks Track (Spotlight)
- TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI AgentsApril 17, 2025 · AAAI 2026
- UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisApril 15, 2025 · Findings of ACL 2025
- Falcon-UI: Understanding GUI Before Following User InstructionsDecember 12, 2024 · arXiv
- Aguvis: Unified Pure Vision Agents for Autonomous GUI InteractionDecember 5, 2024 · ICML 2025 (Poster)
- EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic DataOctober 25, 2024 · arXiv
- OmniParser for Pure Vision Based GUI AgentAugust 1, 2024 · arXiv
- Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens GroundingJune 27, 2024 · EMNLP 2024 (Poster)
- VGA: Vision GUI Assistant - Minimizing Hallucinations through Image-Centric Fine-TuningJune 20, 2024 · Findings of EMNLP 2024
- GUICourse: From General Vision Language Model to Versatile GUI AgentJune 17, 2024 · ACL 2025
- ScreenAI: A Vision-Language Model for UI and Infographics UnderstandingFebruary 7, 2024 · IJCAI 2024