TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable Evolution
Deyang Jiang , Jing Huang , Xuanle Zhao , Lei Chen , Liming Zheng , Fanfan Liu , Haibo Qiu , Peng Shi , Zhixiong Zeng
- 🏛 Institutions
- Meituan
- 📅 Date
- February 10, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
TreeCUA tackles the scaling bottleneck in GUI planning by organizing exploration trajectories as reusable tree structures with verification, summarization, and evaluation. The resulting data supports TreeCUA-DPO, which improves planning quality and out-of-domain generalization.
Related papers (24)
- Demo2Tutorial: From Human Experience to Multimodal Software TutorialsJune 2, 2026 · arXiv
- Executable Agentic Memory for GUI AgentMay 12, 2026 · arXiv
- SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationJanuary 30, 2026 · arXiv
- TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI AgentsApril 17, 2025 · AAAI 2026
- MobileWorldBench: Towards Semantic World Modeling For Mobile AgentsDecember 16, 2025 · arXiv
- WebATLAS: An LLM Agent with Experience-Driven Memory and Action SimulationOctober 26, 2025 · NeurIPS 2025 Workshop on Language Agents and World Models
- Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI AgentMay 20, 2025 · arXiv
- LLM-Powered GUI Agents in Phone Automation: Surveying Progress and ProspectsApril 28, 2025 · TMLR 2025
- UFO2: The Desktop AgentOSApril 20, 2025 · arXiv
- WebRollback: Enhancing Web Agents with Explicit Rollback MechanismsApril 16, 2025 · EACL 2026 (Oral)
- LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent ApplicationsMarch 4, 2025 · NAACL 2025 System Demonstrations
- WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work TasksJuly 7, 2024 · NeurIPS 2024 Datasets and Benchmarks Track (Poster)
- A Real-World WebAgent with Planning, Long Context Understanding, and Program SynthesisJuly 24, 2023 · ICLR 2024 (Oral)
- Naive Visual Memory is Not Enough: A Failure-Mode Study of GUI AgentsJune 12, 2026 · arXiv
- STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJune 1, 2026 · arXiv
- GUI-C²: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement LearningMay 29, 2026 · arXiv
- MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI AgentsMay 18, 2026 · arXiv
- Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent PretrainingMay 14, 2026 · arXiv
- LiteGUI: Distilling Compact GUI Agents with Reinforcement LearningMay 8, 2026 · arXiv
- Step-level Optimization for Efficient Computer-use AgentsApril 29, 2026 · arXiv
- Training Computer Use Agents to Assess the Usability of Graphical User InterfacesApril 28, 2026 · arXiv
- AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding BenchmarkApril 27, 2026 · arXiv
- Human-Guided Harm Recovery for Computer Use AgentsApril 20, 2026 · arXiv
- UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI GroundingApril 15, 2026 · arXiv