VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks
Beitong Zhou, Zhexiao Huang, Yuan Guo, Zhangxuan Gu, Tianyu Xia, Zichen Luo, Fei Tang, Dehan Kong, Yanyi Shang, Suling Ou, Zhenlin Guo, Changhua Meng, Shuheng Shen
- 🏛 Institutions
- Venus Team, Ant Group, iMean AI
- 📅 Date
- December 18, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop Mobile Web
- 🔑 Keywords
TLDR
VenusBench-GD is a bilingual GUI grounding benchmark spanning mobile, desktop, and web platforms, and organizes evaluation into basic and advanced grounding tasks. The paper uses this hierarchy to show that general-purpose multimodal models have mostly caught up on basic grounding, while advanced tasks still expose substantial reasoning and robustness gaps.
Related papers
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- GUI-CEval: A Hierarchical and Comprehensive Chinese Benchmark for Mobile GUI AgentsMarch 16, 2026 · CVPR 2026
- OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic ModelsDecember 18, 2025 · arXiv
- ScaleTrack: Scaling and back-tracking Automated GUI AgentsMay 1, 2025 · arXiv
- On the Robustness of GUI Grounding Models Against Image AttacksApril 7, 2025 · arXiv
- ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer UseApril 4, 2025 · ACM Multimedia 2025