Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus

🏛 Institutions: Google Research
📅 Date: September 29, 2022
📑 Publisher: ICLR 2023 (Poster)
💻 Env: Mobile
🔑 Keywords: model dataset Spotlight focus region vision-only UI understanding

TLDR

Spotlight is a vision-only mobile UI understanding model that takes a screenshot plus a region of interest instead of relying on view hierarchy input. It is pretrained on about 2.5 million mobile UI screens and then used for widget captioning, screen summarization, command grounding, and related UI modeling tasks.

Open paper Report issue