UIBert: Learning Generic Multimodal Representations for UI Understanding
Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, Blaise Agüera y Arcas
- 🏛 Institutions
- Dartmouth College, Google Research
- 📅 Date
- July 29, 2021
- 📑 Publisher
- IJCAI 2021
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
UIBert is a transformer model for UI understanding trained with five UI-specific pretraining tasks over screenshots, text, and structural metadata. Its core idea is that the heterogeneous modalities inside a UI are self-aligned and can supervise one another to learn generic UI representations.
Related papers
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- SecAgent: Efficient Mobile GUI Agent with Semantic ContextMarch 9, 2026 · arXiv
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- UI-Oceanus: Scaling GUI Agents with Synthetic Environmental DynamicsFebruary 11, 2026 · arXiv
- OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task ExecutionJanuary 28, 2026 · arXiv
- MAI-UI Technical Report: Real-World Centric Foundation GUI AgentsDecember 26, 2025 · arXiv