GUI Agents Papers
Star · 821

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Qiushi Sun , Zhoumianze Liu , Chang Ma , Zichen Ding , Fangzhi Xu , Zhangyue Yin , Haiteng Zhao , Zhenyu Wu , Kanzhi Cheng , Zhaoyang Liu , Jianing Wang , Qintong Li , Xiangru Tang , Tianbao Xie , Xiachong Feng , Xiang Li , Ben Kao , Wenhai Wang , Biqing Qi , Lingpeng Kong , Zhiyong Wu

🏛 Institutions
HKU , Shanghai AI Laboratory , Fudan , PKU , NJU , East China Normal University , Yale University
📅 Date
May 26, 2025
📑 Publisher
ICLR 2026 (Poster)
💻 Env
Desktop
🔑 Keywords
TLDR

ScienceBoard introduces a realistic scientific environment and a 169-task benchmark spanning six domains with integrated professional software and mixed-interface workflows. Evaluations with current multimodal agents reach only about 15% overall success, showing that autonomous scientific assistance remains far from reliable.

Open paper arXiv Report issue
Related papers (24)