GUI Agents Papers
Star · 821

See, Plan, Snap: Evaluating Multimodal GUI Agents in Scratch

Xingyi Zhang , Yulei Ye , Kaifeng Huang , Wenhao Li , Xiangfeng Wang

🏛 Institutions
East China Normal University , Tongji University
📅 Date
February 11, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

ScratchWorld evaluates multimodal GUI agents on Scratch program construction tasks that require fine-grained drag-and-drop manipulation. The benchmark exposes a large gap between high-level planning success and low-level GUI execution.

Open paper arXiv Report issue
Related papers (24)