GUI Agents Papers
Star · 751

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust

🏛 Institutions
Google DeepMind, University of Tokyo
📅 Date
July 24, 2023
📑 Publisher
ICLR 2024 (Oral)
💻 Env
Web
🔑 Keywords
TLDR

WebAgent is a modular real-world web agent that decomposes instructions into sub-instructions, summarizes long HTML into task-relevant snippets, and executes generated Python programs on websites. The paper pairs that agent design with HTML-T5, a long-context model for HTML planning and summarization.

Open paper Edit on GitHub Report issue
Related papers