The Amazing Agent Race: Strong Tool Users, Weak Navigators

Zae Myung Kim , Dongseok Lee , Jaehyung Kim , Vipul Raheja , Dongyeop Kang

🏛 Institutions: University of Minnesota
📅 Date: April 11, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: benchmark DAG puzzles navigation tool use Wikipedia AAR

TLDR

The Amazing Agent Race introduces 1,400 DAG-puzzle legs that require fork-merge tool chains over Wikipedia, distinguishing navigation from tool-use ability. The best agent reaches only 37.2%, with navigation errors dominating (27-52% of trials) while tool-use errors stay below 17%, revealing a navigation blind spot invisible to linear benchmarks.

Open paper arXiv Report issue