GUI Agents Papers
Star · 821

GPT-4V(ision) is a Generalist Web Agent, if Grounded

Boyuan Zheng , Boyu Gou , Jihyung Kil , Huan Sun , Yu Su

🏛 Institutions
OSU
📅 Date
January 3, 2024
📑 Publisher
ICML 2024
💻 Env
Web
🔑 Keywords
TLDR

SeeAct studies GPT-4V as a generalist web agent and adds an online evaluation setup for running agents on live websites. It shows that GPT-4V is strong when grounding is handled manually, and identifies grounding as the main remaining bottleneck.

Open paper arXiv Report issue
Related papers (24)