GUI Agents Papers
Star · 751

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

Pei Yang, Hai Ci, Mike Zheng Shou

🏛 Institutions
Show Lab, NUS
📅 Date
June 4, 2025
📑 Publisher
NeurIPS 2025 (Poster)
💻 Env
Desktop
🔑 Keywords
TLDR

macOSWorld is the first interactive benchmark for GUI agents on macOS, covering 202 multilingual tasks across 30 applications and a dedicated safety subset for deception attacks. The evaluation shows large performance gaps between proprietary and open-source agents, substantial multilingual degradation, and unresolved safety weaknesses on macOS-specific workflows.

Open paper arXiv Edit on GitHub Report issue
Related papers