GUI Agents Papers
Star · 821

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Liya Zhu , Jingzhe Ding , Jian Zhang , Jianbo Xue , Shihao Liang , Ge Zhang , Yi Zhu , Duju Zeng , Xiang Gao , Qingshui Gu , Mailun Gao , Huimin Che , Yan Zhao , Peiheng Zhou , Haojun Wang , Chaobo Xian , Lili Le , Chi Wu , Yiwei Liu , Shengda Long , Jiale Yang , Fangzhi Xu , Sijin Wu , Haodong Duan , Chao He , Zhaojian Li , Minchao Wang , Huan Zhou , Jiani Hou , Chuqian Yu , Weiran Shi , Hongwan Gao , Jiamin Chen , Guanhong Chen , Tingqin Luo , Kaiyuan Zhang , Zhixin Yao , Qing Hua , Yuhao Jiang , Jin Chen , Pu Chen , Zhenyu Hu , Xingyu Li , Zhengxuan Jiang , Meng Cao , Tianfeng Long , Haozhe Wang , Mingzhang Wang , Yichen Zhang , Yiming Dai , Chenchen Zhang , Jiaying Wang , Xinying Liu , Xingzu Liu , Lingling Zhang , Xinjie Chen , Yujia Qin , Wangchunshu Zhou , Zhiyong Wu , Yang Liu , Jiaheng Liu , Lei Zhang , Shen Yan , Wenhao Huang , Zaiyuan Wang , Xiaolong Chang

🏛 Institutions
Unknown
📅 Date
June 9, 2026
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

Workflow-GYM is a benchmark for long-horizon computer-use tasks in professional software environments. It evaluates whether agents can complete domain-specific workflows through GUIs and reports that current state-of-the-art models still struggle with end-to-end professional tasks.

Open paper arXiv Report issue
Related papers (24)