GUI Agents Papers
Star · 821

Autonomous Evaluation and Refinement of Digital Agents

Jiayi Pan , Yichi Zhang , Nicholas Tomlin , Yifei Zhou , Sergey Levine , Alane Suhr

🏛 Institutions
UC Berkeley , University of Michigan
📅 Date
April 9, 2024
📑 Publisher
COLM 2024
💻 Env
Web Desktop
🔑 Keywords
TLDR

This paper studies domain-general automatic evaluators for web-navigation and device-control agents, showing 74.4% to 92.9% agreement with oracle evaluation metrics across popular digital-agent benchmarks. It then uses those evaluators for fine-tuning and inference-time guidance, improving WebArena performance by 29% and device-control settings by around 75% relative.

Open paper arXiv Report issue
Related papers (24)