GUI Agents Papers
Star · 751

UIBert: Learning Generic Multimodal Representations for UI Understanding

Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, Blaise Agüera y Arcas

🏛 Institutions
Dartmouth College, Google Research
📅 Date
July 29, 2021
📑 Publisher
IJCAI 2021
💻 Env
Mobile
🔑 Keywords
TLDR

UIBert is a transformer model for UI understanding trained with five UI-specific pretraining tasks over screenshots, text, and structural metadata. Its core idea is that the heterogeneous modalities inside a UI are self-aligned and can supervise one another to learn generic UI representations.

Open paper Edit on GitHub Report issue
Related papers