TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable Evolution

Deyang Jiang , Jing Huang , Xuanle Zhao , Lei Chen , Liming Zheng , Fanfan Liu , Haibo Qiu , Peng Shi , Zhixiong Zeng

🏛 Institutions: Meituan
📅 Date: February 10, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: planning multi-agent tree-structured evolution trajectory generation TreeCUA TreeCUA-DPO

TLDR

TreeCUA tackles the scaling bottleneck in GUI planning by organizing exploration trajectories as reusable tree structures with verification, summarization, and evaluation. The resulting data supports TreeCUA-DPO, which improves planning quality and out-of-domain generalization.

Open paper arXiv Report issue

Related papers (24)

Demo2Tutorial: From Human Experience to Multimodal Software Tutorials

June 2, 2026 · arXiv
Executable Agentic Memory for GUI Agent

May 12, 2026 · arXiv
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

January 30, 2026 · arXiv
TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

April 17, 2025 · AAAI 2026
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents

December 16, 2025 · arXiv
WebATLAS: An LLM Agent with Experience-Driven Memory and Action Simulation

October 26, 2025 · NeurIPS 2025 Workshop on Language Agents and World Models
Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent

May 20, 2025 · arXiv
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

April 28, 2025 · TMLR 2025
UFO2: The Desktop AgentOS

April 20, 2025 · arXiv
WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms

April 16, 2025 · EACL 2026 (Oral)
LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications

March 4, 2025 · NAACL 2025 System Demonstrations
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

July 7, 2024 · NeurIPS 2024 Datasets and Benchmarks Track (Poster)
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

July 24, 2023 · ICLR 2024 (Oral)
Naive Visual Memory is Not Enough: A Failure-Mode Study of GUI Agents

June 12, 2026 · arXiv
STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

June 1, 2026 · arXiv
GUI-C²: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning

May 29, 2026 · arXiv
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

May 18, 2026 · arXiv
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

May 14, 2026 · arXiv
LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning

May 8, 2026 · arXiv
Step-level Optimization for Efficient Computer-use Agents

April 29, 2026 · arXiv
Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

April 28, 2026 · arXiv
AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark

April 27, 2026 · arXiv
Human-Guided Harm Recovery for Computer Use Agents

April 20, 2026 · arXiv
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

April 15, 2026 · arXiv