Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

A. Said Gurbuz , Sunghwan Hong , Ahmed Nassar , Marc Pollefeys , Peter Staar

🏛 Institutions: IBM Research , ETH , KAIST
📅 Date: February 15, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: GUI grounding dataset screen parsing dense supervision ScreenParse UI understanding

TLDR

This paper argues that sparse grounding supervision is insufficient for GUI understanding and introduces ScreenParse, a large-scale densely annotated screen-parsing dataset. It provides complete UI-element supervision across web screenshots to support richer grounding and UI understanding models.

Open paper arXiv Report issue

Related papers (24)

OmniParser for Pure Vision Based GUI Agent

August 1, 2024 · arXiv
Beyond Clicking: A Step Towards Generalist GUI Grounding via Text Dragging

November 7, 2025 · arXiv
Scaling Computer‑Use Grounding via User Interface Decomposition and Synthesis

May 19, 2025 · NeurIPS 2025 Datasets and Benchmarks Track (Spotlight)
UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis

April 15, 2025 · Findings of ACL 2025
EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data

October 25, 2024 · arXiv
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

February 7, 2024 · IJCAI 2024
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

October 30, 2024 · ICLR 2025 (Spotlight)
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

October 7, 2024 · ICLR 2025 (Oral)
MobileViews: A Million-scale and Diverse Mobile GUI Dataset

September 22, 2024 · arXiv
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

January 17, 2024 · ACL 2024
GUI-C²: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning

May 29, 2026 · arXiv
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

May 14, 2026 · arXiv
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

April 15, 2026 · arXiv
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models

April 15, 2026 · arXiv
See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback

April 14, 2026 · arXiv
What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

April 8, 2026 · Findings of ACL 2026
Towards GUI Agents: Vision-Language Diffusion Models for GUI Grounding

March 27, 2026 · CVPR 2026
AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement

March 18, 2026 · arXiv
Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements

March 15, 2026 · arXiv
Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal Fusion

February 6, 2026 · arXiv
POINTS-GUI-G: GUI-Grounding Journey

February 6, 2026 · arXiv
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

January 30, 2026 · arXiv
GUIGuard: Toward a General Framework for Privacy-Preserving GUI Agents

January 26, 2026 · arXiv
V2P: Visual Attention Calibration for GUI Grounding via Background Suppression and Center Peaking

January 11, 2026 · arXiv