ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use
Kaixin Li , Ziyang Meng , Hongzhan Lin , Ziyang Luo , Yuchen Tian , Jing Ma , Zhiyong Huang , Tat-Seng Chua
- 🏛 Institutions
- NUS , East China Normal University , Hong Kong Baptist University
- 📅 Date
- April 4, 2025
- 📑 Publisher
- ACM Multimedia 2025
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
ScreenSpot-Pro benchmarks GUI grounding in professional high-resolution computer-use settings with 1,581 tasks across 23 applications, five industries, and three operating systems. The paper also proposes ScreenSeekeR, a cascaded visual search method guided by planner knowledge, and shows that current grounding models remain weak in these professional environments.
Related papers (24)
- VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding TasksDecember 18, 2025 · arXiv
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding ModelsApril 15, 2026 · arXiv
- What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI ReasoningApril 8, 2026 · Findings of ACL 2026
- POINTS-GUI-G: GUI-Grounding JourneyFebruary 6, 2026 · arXiv
- Beyond Clicking: A Step Towards Generalist GUI Grounding via Text DraggingNovember 7, 2025 · arXiv
- GUI-Spotlight: Adaptive Iterative Focus Refinement for Enhanced GUI Visual GroundingOctober 5, 2025 · arXiv
- UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time GroundingJuly 29, 2025 · CVPR 2026 Findings
- Scaling Computer‑Use Grounding via User Interface Decomposition and SynthesisMay 19, 2025 · NeurIPS 2025 Datasets and Benchmarks Track (Spotlight)
- UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction SynthesisApril 15, 2025 · Findings of ACL 2025
- Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional FieldsJune 9, 2026 · arXiv
- WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application EnvironmentsApril 30, 2026 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv
- HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration TasksApril 10, 2026 · arXiv
- Gym-Anything: Turn any Software into an Agent EnvironmentApril 7, 2026 · arXiv
- HippoCamp: Benchmarking Contextual Agents on Personal ComputersApril 1, 2026 · arXiv
- PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation AgentsMarch 9, 2026 · arXiv
- OSExpert: Computer-Use Agents Learning Professional Skills via ExplorationMarch 9, 2026 · arXiv
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use AgentsFebruary 9, 2026 · arXiv
- When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use AgentsFebruary 9, 2026 · arXiv
- OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive TasksJanuary 28, 2026 · arXiv
- EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI AgentsJanuary 25, 2026 · arXiv
- MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning CorrectionJanuary 19, 2026 · arXiv
- ShowUI-π: Flow-based Generative Models as GUI Dexterous HandsDecember 31, 2025 · arXiv