UIBert: Learning Generic Multimodal Representations for UI Understanding

Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, Blaise Agüera y Arcas

🏛 Institutions: Dartmouth College, Google Research
📅 Date: July 29, 2021
📑 Publisher: IJCAI 2021
💻 Env: Mobile
🔑 Keywords: model pretraining multimodal representation learning self-alignment UIBert

TLDR

UIBert is a transformer model for UI understanding trained with five UI-specific pretraining tasks over screenshots, text, and structural metadata. Its core idea is that the heterogeneous modalities inside a UI are self-aligned and can supervise one another to learn generic UI representations.

Open paper Edit on GitHub Report issue