GUI Agents Papers
Star · 821

AgentBench: Evaluating LLMs as Agents

Xiao Liu , Hao Yu , Hanchen Zhang , Yifan Xu , Xuanyu Lei , Hanyu Lai , Yu Gu , Hangliang Ding , Kaiwen Men , Kejuan Yang , Shudan Zhang , Xiang Deng , Aohan Zeng , Zhengxiao Du , Chenhui Zhang , Sheng Shen , Tianjun Zhang , Yu Su , Huan Sun , Minlie Huang , Yuxiao Dong , Jie Tang

🏛 Institutions
Tsinghua University , The Ohio State University , ByteDance
📅 Date
January 1, 2024
📑 Publisher
ICLR 2024
💻 Env
🔑 Keywords
TLDR

Introduces AgentBench, a broad benchmark suite for evaluating LLMs as agents across eight environments, including but not limited to OS interaction. It belongs in this repo mainly as a general agent-evaluation reference rather than a pure GUI benchmark.

Open paper Report issue
Related papers (24)