GUI Agents Papers
Star · 751

OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation

Zilong Wang, Yuedong Cui, Li Zhong, Zimin Zhang, Da Yin, Bill Yuchen Lin, Jingbo Shang

🏛 Institutions
UC San Diego, UCLA, Allen Institute for AI
📅 Date
July 26, 2024
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

OfficeBench is a benchmark for office automation tasks that require agents to plan across multiple applications, switch contexts correctly, and ground actions inside a large combined action space. The paper reports only 47% pass rate for GPT-4 Omni and highlights redundancy, hallucination, and application-switching errors as core failure modes.

Open paper arXiv Edit on GitHub Report issue
Related papers