GUI Agents Papers
Star · 821

Harnessing Webpage UIs for Text-Rich Visual Understanding

Junpeng Liu , Tianyue Ou , Yifan Song , Yuxiao Qu , Wai Lam , Chenyan Xiong , Wenhu Chen , Graham Neubig , Xiang Yue

🏛 Institutions
CMU , CUHK , PKU , University of Waterloo
📅 Date
October 17, 2024
📑 Publisher
ICLR 2025 (Poster)
💻 Env
Web
🔑 Keywords
TLDR

This paper builds MultiUI, a 7.3M-sample dataset synthesized from 1M websites by pairing webpage screenshots with instructions generated from cleaned accessibility trees. Training on MultiUI improves web UI understanding and also transfers to broader text-rich visual tasks such as OCR, document understanding, and chart interpretation.

Open paper Report issue
Related papers (24)