WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

Léo Boisvert, Megh Thakkar, Maxime Gasse, Massimo Caccia, Thibault Le Sellier De Chezelles, Quentin Cappart, Nicolas Chapados, Alexandre Lacoste, Alexandre Drouin

🏛 Institutions: ServiceNow Research, Mila, Polytechnique Montréal, Chandar Research Lab
📅 Date: July 7, 2024
📑 Publisher: NeurIPS 2024 Datasets and Benchmarks Track (Poster)
💻 Env: Web
🔑 Keywords: benchmark dataset planning knowledge work compositional tasks oracle traces WorkArena++

TLDR

WorkArena++ is a web benchmark of 682 enterprise knowledge-work tasks built on ServiceNow to stress compositional planning, retrieval, reasoning, and contextual understanding. Besides the benchmark itself, it adds a mechanism for generating thousands of oracle observation-action traces that can be used to fine-tune web agents.

Open paper Edit on GitHub Report issue