GUI Agents Papers
Star · 821

VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking

Shunyu Liu , Minghao Liu , Huichi Zhou , Zhenyu Cui , Yang Zhou , Yuhao Zhou , Jialiang Gao , Heng Zhou , Yunhao Yang , Wendong Fan , puzhen zhang , Ge Zhang , Jiajun Shi , Weihao Xuan , Jiaxing Huang , Shuang Luo , Fang Wu , Heli Qi , Qingcheng Zeng , Junjie Wang , Aosong Feng , Jindi Lv , Sicong Jiang , Ziqi Ren , Wangchunshu Zhou , Zhenfei Yin , Wenlong Zhang , Guohao Li , Wenhao Yu , Lei Ma , Lei Bai , Qunshu Lin , Mingli Song , Dacheng Tao

🏛 Institutions
NTU , ZJU , University of Tokyo , Shanghai AI Laboratory , Google DeepMind , University of Alberta
📅 Date
August 6, 2025
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

VeriWeb is a web benchmark for long-chain information-seeking tasks that decomposes each problem into interdependent, verifiable subtasks instead of relying only on final-answer checks. It contains 302 human-annotated tasks across five domains and is designed to stress both coverage-oriented search and multi-hop context tracking in realistic web environments.

Open paper arXiv Report issue
Related papers (24)