GUI Agents Papers
Star · 751

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu

🏛 Institutions
HKU, Shanghai AI Laboratory, Fudan, PKU, NJU, East China Normal University, Yale University
📅 Date
May 26, 2025
📑 Publisher
ICLR 2026 (Poster)
💻 Env
Desktop
🔑 Keywords
TLDR

ScienceBoard introduces a realistic scientific environment and a 169-task benchmark spanning six domains with integrated professional software and mixed-interface workflows. Evaluations with current multimodal agents reach only about 15% overall success, showing that autonomous scientific assistance remains far from reliable.

Open paper arXiv Edit on GitHub Report issue
Related papers