GUI Agents Papers
Star · 821

OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation

Zilong Wang , Yuedong Cui , Li Zhong , Zimin Zhang , Da Yin , Bill Yuchen Lin , Jingbo Shang

🏛 Institutions
UC San Diego , UCLA , Allen Institute for AI
📅 Date
July 26, 2024
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

OfficeBench is a benchmark for office automation tasks that require agents to plan across multiple applications, switch contexts correctly, and ground actions inside a large combined action space. The paper reports only 47% pass rate for GPT-4 Omni and highlights redundancy, hallucination, and application-switching errors as core failure modes.

Open paper arXiv Report issue
Related papers (24)