WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments

Jinchao Li , Yunxin Li , Chenrui Zhao , Zhenran Xu , Baotian Hu , Min Zhang

🏛 Institutions: HIT-Shenzhen
📅 Date: April 30, 2026
📑 Publisher: arXiv
💻 Env: Desktop
🔑 Keywords: benchmark process-centric cross-application Windows WindowsWorld

TLDR

WindowsWorld targets the gap that existing GUI benchmarks focus on isolated single-application tasks, presenting a process-centric suite of 181 cross-application desktop tasks (avg 5.0 sub-goals across 17 applications, 78% multi-application). Evaluated computer-use agents fall below 21% success on multi-application tasks, substantially trailing single-application performance and exposing weak workflow-level coordination.

Open paper arXiv Report issue