EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data
Xuetian Chen , Hangcheng Li , Jiaqing Liang , Sihang Jiang , Deqing Yang
- 🏛 Institutions
- Fudan
- 📅 Date
- October 25, 2024
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
EDGE is a synthetic-data pipeline for GUI understanding that generates large-scale multi-granularity supervision from webpages. Models trained on the resulting dataset improve webpage understanding first and then transfer that gain to previously unseen desktop and mobile GUI environments with much less manual annotation.
Related papers (24)
- Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI AgentsOctober 7, 2024 · ICLR 2025 (Oral)
- Moving Beyond Sparse Grounding with Complete Screen Parsing SupervisionFebruary 15, 2026 · arXiv
- Beyond Clicking: A Step Towards Generalist GUI Grounding via Text DraggingNovember 7, 2025 · arXiv
- Scaling Computer‑Use Grounding via User Interface Decomposition and SynthesisMay 19, 2025 · NeurIPS 2025 Datasets and Benchmarks Track (Spotlight)
- UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisApril 15, 2025 · Findings of ACL 2025
- OmniParser for Pure Vision Based GUI AgentAugust 1, 2024 · arXiv
- OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsOctober 30, 2024 · ICLR 2025 (Spotlight)
- MobileViews: A Million-scale and Diverse Mobile GUI DatasetSeptember 22, 2024 · arXiv
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- GUI-C²: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement LearningMay 29, 2026 · arXiv
- Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent PretrainingMay 14, 2026 · arXiv
- UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI GroundingApril 15, 2026 · arXiv
- GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding ModelsApril 15, 2026 · arXiv
- See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual FeedbackApril 14, 2026 · arXiv
- What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI ReasoningApril 8, 2026 · Findings of ACL 2026
- Towards GUI Agents: Vision-Language Diffusion Models for GUI GroundingMarch 27, 2026 · CVPR 2026
- AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction RefinementMarch 18, 2026 · arXiv
- Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface ElementsMarch 15, 2026 · arXiv
- Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal FusionFebruary 6, 2026 · arXiv
- POINTS-GUI-G: GUI-Grounding JourneyFebruary 6, 2026 · arXiv
- SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationJanuary 30, 2026 · arXiv
- GUIGuard: Toward a General Framework for Privacy-Preserving GUI AgentsJanuary 26, 2026 · arXiv
- V2P: Visual Attention Calibration for GUI Grounding via Background Suppression and Center PeakingJanuary 11, 2026 · arXiv
- MVP: Multiple View Prediction Improves GUI GroundingDecember 9, 2025 · arXiv