GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su
- 🏛 Institutions
- OSU
- 📅 Date
- January 3, 2024
- 📑 Publisher
- ICML 2024
- 💻 Env
- Web
- 🔑 Keywords
TLDR
SeeAct studies GPT-4V as a generalist web agent and adds an online evaluation setup for running agents on live websites. It shows that GPT-4V is strong when grounding is handled manually, and identifies grounding as the main remaining bottleneck.
Related papers
- Why Do LLM-based Web Agents Fail? A Hierarchical Planning PerspectiveMarch 15, 2026 · arXiv
- Enhancing Web Agents with a Hierarchical Memory TreeMarch 7, 2026 · arXiv
- OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI GroundingFebruary 25, 2026 · arXiv
- ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge EvolutionJanuary 12, 2026 · arXiv
- WebATLAS: An LLM Agent with Experience-Driven Memory and Action SimulationOctober 26, 2025 · NeurIPS 2025 Workshop on Language Agents and World Models
- Surfer 2: The Next Generation of Cross-Platform Computer Use AgentsOctober 22, 2025 · arXiv