GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks
Saelyne Yang, Jaesang Yu, Yi-Hao Peng, Kevin Qinghong Lin, Jae Won Cho, Yale Song, Juho Kim
- 🏛 Institutions
- KAIST, CMU, Oxford, Konkuk University, Google, SkillBench
- 📅 Date
- March 26, 2026
- 📑 Publisher
- CVPR 2026
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
GUIDE studies collaborative GUI assistance rather than pure task automation, using 67.5 hours of think-aloud recordings from 120 novice users across 10 software applications. It benchmarks behavior-state detection, intent prediction, and help prediction, and shows that current multimodal models still struggle to infer what users are doing and when intervention would be useful.
Related papers
- GUIDE: Interpretable GUI Agent Evaluation via Hierarchical DiagnosisApril 6, 2026 · arXiv
- AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding BenchmarkApril 27, 2026 · arXiv
- GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding ModelsApril 15, 2026 · arXiv
- CocoaBench: Evaluating Unified Digital Agents in the WildApril 13, 2026 · arXiv
- What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI ReasoningApril 8, 2026 · Findings of ACL 2026
- See, Plan, Snap: Evaluating Multimodal GUI Agents in ScratchFebruary 11, 2026 · arXiv