NaturalGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset

Zihan Zheng , Tianle Cui , Chuwen Xie , Jiahui Zhang , Jiahui Pan , Lewei He , Qianglong Chen

🏛 Institutions: South China Normal University , ZJU
📅 Date: August 2, 2025
📑 Publisher: arXiv
💻 Env: Desktop Mobile
🔑 Keywords: benchmark dataset causal pathways LightManus NaturalGAIA

TLDR

NaturalGAIA is a GUI benchmark that decomposes long-horizon tasks into causally structured, programmatically verifiable atomic steps and evaluates them with Weighted Pathway Success Rate. The paper pairs the benchmark with a human-verified trajectory dataset collected through the hierarchical LightManus framework and shows that even strong models struggle on the resulting desktop-and-mobile tasks.

Open paper arXiv Report issue