VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks
Beitong Zhou , Zhexiao Huang , Yuan Guo , Zhangxuan Gu , Tianyu Xia , Zichen Luo , Fei Tang , Dehan Kong , Yanyi Shang , Suling Ou , Zhenlin Guo , Changhua Meng , Shuheng Shen
- 🏛 Institutions
- Venus Team , Ant Group , iMean AI
- 📅 Date
- December 18, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop Mobile Web
- 🔑 Keywords
TLDR
VenusBench-GD is a bilingual GUI grounding benchmark spanning mobile, desktop, and web platforms, and organizes evaluation into basic and advanced grounding tasks. The paper uses this hierarchy to show that general-purpose multimodal models have mostly caught up on basic grounding, while advanced tasks still expose substantial reasoning and robustness gaps.
Related papers (24)
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- GUI-CEval: A Hierarchical and Comprehensive Chinese Benchmark for Mobile GUI AgentsMarch 16, 2026 · CVPR 2026
- OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic ModelsDecember 18, 2025 · arXiv
- ScaleTrack: Scaling and back-tracking Automated GUI AgentsMay 1, 2025 · arXiv
- On the Robustness of GUI Grounding Models Against Image AttacksApril 7, 2025 · arXiv
- ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer UseApril 4, 2025 · ACM Multimedia 2025
- UI-TARS: Pioneering Automated GUI Interaction with Native AgentsJanuary 21, 2025 · arXiv
- Ponder & Press: Advancing Visual GUI Agent towards General Computer ControlDecember 2, 2024 · Findings of ACL 2025
- Improved GUI Grounding via Iterative NarrowingNovember 18, 2024 · arXiv
- OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsOctober 30, 2024 · ICLR 2025 (Spotlight)
- TinyClick: Single-Turn Agent for Empowering GUI AutomationOctober 9, 2024 · INTERSPEECH 2025
- Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI AgentsOctober 7, 2024 · ICLR 2025 (Oral)
- GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented UnderstandingJune 16, 2024 · ICLR 2025 (Poster)
- VideoGUI: A Benchmark for GUI Automation from Instructional VideosJune 14, 2024 · NeurIPS 2024 Datasets and Benchmarks Track
- GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding ModelsApril 15, 2026 · arXiv
- What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI ReasoningApril 8, 2026 · Findings of ACL 2026
- PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation AgentsMarch 9, 2026 · arXiv
- Beyond Clicking: A Step Towards Generalist GUI Grounding via Text DraggingNovember 7, 2025 · arXiv
- NaturalGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory DatasetAugust 2, 2025 · arXiv
- RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use AgentsMay 31, 2025 · NeurIPS 2025 (Poster)
- RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS EnvironmentsMay 28, 2025 · ICLR 2026 (Oral)
- Scaling Computer‑Use Grounding via User Interface Decomposition and SynthesisMay 19, 2025 · NeurIPS 2025 Datasets and Benchmarks Track (Spotlight)
- UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisApril 15, 2025 · Findings of ACL 2025
- WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting PointFebruary 12, 2025 · arXiv