GUI Agents Papers
Star · 821

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Tianqi Xu , Linyao Chen , Dai-Jie Wu , Yanjun Chen , Zecheng Zhang , Xiang Yao , Zhiqiang Xie , Yongchao Chen , Shilong Liu , Bochen Qian , Anjie Yang , Zhaoxuan Jin , Jianbo Deng , Philip Torr , Bernard Ghanem , Guohao Li

🏛 Institutions
KAUST , Eigent.AI , CAMEL-AI.org , UTokyo , CMU , Stanford , Harvard , Tsinghua , SUSTech , Oxford , NU
📅 Date
July 1, 2024
📑 Publisher
Findings of ACL 2025
💻 Env
Desktop Mobile
🔑 Keywords
TLDR

CRAB is a benchmark framework for multimodal agents that supports cross-environment tasks and graph-based fine-grained evaluation instead of single-platform end-state scoring. Its CRAB Benchmark-v0 release contains 120 desktop and mobile tasks, and the paper reports a best completion ratio of 38.01% from a single GPT-4o agent.

Open paper arXiv Report issue
Related papers (24)