GUI Agents Papers
Star · 821

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

Pei Yang , Hai Ci , Mike Zheng Shou

🏛 Institutions
Show Lab , NUS
📅 Date
June 4, 2025
📑 Publisher
NeurIPS 2025 (Poster)
💻 Env
Desktop
🔑 Keywords
TLDR

macOSWorld is the first interactive benchmark for GUI agents on macOS, covering 202 multilingual tasks across 30 applications and a dedicated safety subset for deception attacks. The evaluation shows large performance gaps between proprietary and open-source agents, substantial multilingual degradation, and unresolved safety weaknesses on macOS-specific workflows.

Open paper arXiv Report issue
Related papers (24)