Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision
A. Said Gurbuz, Sunghwan Hong, Ahmed Nassar, Marc Pollefeys, Peter Staar
- 🏛 Institutions
- IBM Research, ETH, KAIST
- 📅 Date
- February 15, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This paper argues that sparse grounding supervision is insufficient for GUI understanding and introduces ScreenParse, a large-scale densely annotated screen-parsing dataset. It provides complete UI-element supervision across web screenshots to support richer grounding and UI understanding models.
Related papers
- OmniParser for Pure Vision Based GUI AgentAugust 1, 2024 · arXiv
- Beyond Clicking: A Step Towards Generalist GUI Grounding via Text DraggingNovember 7, 2025 · arXiv
- Scaling Computer‑Use Grounding via User Interface Decomposition and SynthesisMay 19, 2025 · NeurIPS 2025 Datasets and Benchmarks Track (Spotlight)
- UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisApril 15, 2025 · Findings of ACL 2025
- EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic DataOctober 25, 2024 · arXiv
- ScreenAI: A Vision-Language Model for UI and Infographics UnderstandingFebruary 7, 2024 · IJCAI 2024