GUI Agents Papers
Star · 751

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

🏛 Institutions
Show Lab, NUS, Microsoft
📅 Date
June 14, 2024
📑 Publisher
NeurIPS 2024 Datasets and Benchmarks Track
💻 Env
Desktop
🔑 Keywords
TLDR

VideoGUI is a desktop GUI benchmark built from high-quality instructional videos covering visual-centric software such as Photoshop, video editing tools, and Stable Diffusion WebUI. It evaluates assistants at high-level planning, middle-level action narration, and atomic execution, and finds that even GPT-4o performs poorly on these visually specified tasks.

Open paper Edit on GitHub Report issue
Related papers