GUI Agents Papers
Star · 751

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

Bofei Zhang, Zirui Shang, Zhi Gao, Wang Zhang, Rui Xie, Xiaojian Ma, Tao Yuan, Xinxiao Wu, Song-Chun Zhu, Qing Li

🏛 Institutions
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing Institute of Technology, PKU, SJTU, Tsinghua
📅 Date
April 17, 2025
📑 Publisher
AAAI 2026
💻 Env
General GUI
🔑 Keywords
TLDR

TongUI turns multimodal web tutorials into large-scale GUI-agent training trajectories by crawling and processing tutorial videos and articles. The resulting GUI-Net dataset spans 143K trajectories across five operating systems and more than 200 applications, and fine-tuning on it improves generalized GUI-agent performance.

Open paper arXiv Edit on GitHub Report issue
Related papers