Demo2Tutorial: From Human Experience to Multimodal Software Tutorials

Zechen Bai , Zhiheng Chen , Yiqi Lin , Kevin Qinghong Lin , Difei Gao , Xiangwu Guo , Xin Wang , Mike Zheng Shou

🏛 Institutions: Unknown
📅 Date: June 2, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: software tutorials task graphs planning Demo2Tutorial

TLDR

Demo2Tutorial converts screen recordings and interaction logs into structured multimodal software tutorials with parsed actions, intents, and hierarchical task graphs. The paper evaluates tutorial generation quality and shows that the resulting representations improve downstream GUI-agent planning and generalization.

Open paper arXiv Report issue