Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus
- 🏛 Institutions
- Google Research
- 📅 Date
- September 29, 2022
- 📑 Publisher
- ICLR 2023 (Poster)
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
Spotlight is a vision-only mobile UI understanding model that takes a screenshot plus a region of interest instead of relying on view hierarchy input. It is pretrained on about 2.5 million mobile UI screens and then used for widget captioning, screen summarization, command grounding, and related UI modeling tasks.
Related papers (24)
- SecAgent: Efficient Mobile GUI Agent with Semantic ContextMarch 9, 2026 · arXiv
- UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI AgentsMay 27, 2025 · NeurIPS 2025 (Poster)
- SpiritSight Agent: Advanced GUI Agent with One LookMarch 5, 2025 · CVPR 2025 (Poster)
- ShowUI: One Vision-Language-Action Model for GUI Visual AgentNovember 26, 2024 · CVPR 2025 (Poster)
- OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsOctober 30, 2024 · ICLR 2025 (Spotlight)
- MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI UnderstandingSeptember 23, 2024 · Findings of EMNLP 2024
- CogAgent: A Visual Language Model for GUI AgentsDecember 14, 2023 · CVPR 2024 (Highlight)
- MolmoWeb: Open Visual Web Agent and Open Data for the Open WebApril 9, 2026 · arXiv
- ShowUI-π: Flow-based Generative Models as GUI Dexterous HandsDecember 31, 2025 · arXiv
- OpenCUA: Open Foundations for Computer-Use AgentsAugust 12, 2025 · NeurIPS 2025 (Spotlight)
- Web-Shepherd: Advancing PRMs for Reinforcing Web AgentsMay 21, 2025 · NeurIPS 2025 (Spotlight)
- Efficient Agent Training for Computer UseMay 20, 2025 · ICLR 2026 (Poster)
- STEVE: A Step Verification Pipeline for Computer-use Agent TrainingMarch 16, 2025 · arXiv
- Falcon-UI: Understanding GUI Before Following User InstructionsDecember 12, 2024 · arXiv
- Aguvis: Unified Pure Vision Agents for Autonomous GUI InteractionDecember 5, 2024 · ICML 2025 (Poster)
- ScreenAI: A Vision-Language Model for UI and Infographics UnderstandingFebruary 7, 2024 · IJCAI 2024
- Multimodal Web Navigation with Instruction-Finetuned Foundation ModelsMay 19, 2023 · ICLR 2024
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- PSPA-Bench: A Personalized Benchmark for Smartphone GUI AgentMarch 31, 2026 · arXiv
- Video-Based Reward Modeling for Computer-Use AgentsMarch 10, 2026 · arXiv
- Turing Test on Screen: A Benchmark for Mobile GUI Agent HumanizationFebruary 24, 2026 · arXiv
- Mobile-Agent-v3.5: Multi-platform Fundamental GUI AgentsFebruary 15, 2026 · arXiv
- AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the WildFebruary 12, 2026 · arXiv
- UI-Oceanus: Scaling GUI Agents with Synthetic Environmental DynamicsFebruary 11, 2026 · arXiv