GUI Agents Papers
Star · 821

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Hiroki Furuta , Kuang-Huei Lee , Ofir Nachum , Yutaka Matsuo , Aleksandra Faust , Shixiang Shane Gu , Izzeddin Gur

🏛 Institutions
University of Tokyo , Google DeepMind
📅 Date
May 19, 2023
📑 Publisher
ICLR 2024
💻 Env
Web
🔑 Keywords
TLDR

This paper studies offline multimodal web-agent training with WebGUM, which takes both webpage screenshots and HTML as input. It also releases 347K demonstrations and shows strong gains on MiniWoB and WebShop, with positive transfer to Mind2Web.

Open paper arXiv Report issue
Related papers (24)