GUI Agents Papers
Star · 751

See, Plan, Snap: Evaluating Multimodal GUI Agents in Scratch

Xingyi Zhang, Yulei Ye, Kaifeng Huang, Wenhao Li, Xiangfeng Wang

🏛 Institutions
East China Normal University, Tongji University
📅 Date
February 11, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

ScratchWorld evaluates multimodal GUI agents on Scratch program construction tasks that require fine-grained drag-and-drop manipulation. The benchmark exposes a large gap between high-level planning success and low-level GUI execution.

Open paper arXiv Edit on GitHub Report issue
Related papers