GUI Agents Papers
Star · 821

Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus

Gang Li , Yang Li

🏛 Institutions
Google Research
📅 Date
September 29, 2022
📑 Publisher
ICLR 2023 (Poster)
💻 Env
Mobile
🔑 Keywords
TLDR

Spotlight is a vision-only mobile UI understanding model that takes a screenshot plus a region of interest instead of relying on view hierarchy input. It is pretrained on about 2.5 million mobile UI screens and then used for widget captioning, screen summarization, command grounding, and related UI modeling tasks.

Open paper Report issue
Related papers (24)