TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents

Bofei Zhang , Zirui Shang , Zhi Gao , Wang Zhang , Rui Xie , Xiaojian Ma , Tao Yuan , Xinxiao Wu , Song-Chun Zhu , Qing Li

🏛 Institutions: State Key Laboratory of General Artificial Intelligence , BIGAI , Beijing Institute of Technology , PKU , SJTU , Tsinghua
📅 Date: April 17, 2025
📑 Publisher: AAAI 2026
💻 Env: General GUI
🔑 Keywords: dataset tutorial mining trajectory generation GUI-Net TongUI

TLDR

TongUI turns multimodal web tutorials into large-scale GUI-agent training trajectories by crawling and processing tutorial videos and articles. The resulting GUI-Net dataset spans 143K trajectories across five operating systems and more than 200 applications, and fine-tuning on it improves generalized GUI-agent performance.

Open paper arXiv Report issue