GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation
Rui Xie , Zhi Gao , Chenrui Shi , Zirui Shang , Lu Chen , Qing Li
- 🏛 Institutions
- SJTU , State Key Laboratory of General Artificial Intelligence , BIGAI , Beijing Institute of Technology
- 📅 Date
- March 27, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
GUIDE is a training-free add-on for desktop GUI agents that retrieves relevant tutorial videos, turns them into planning and grounding annotations, and injects that expertise into existing agents without changing model parameters. On OSWorld, it improves multiple agent families while also reducing execution steps.
Related papers (24)
- IntentScore: Intent-Conditioned Action Evaluation for Computer-Use AgentsApril 6, 2026 · arXiv
- GPA: Learning GUI Process Automation from DemonstrationsApril 2, 2026 · arXiv
- EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic ExperienceJanuary 22, 2026 · arXiv
- CaMeLs Can Use Computers Too: System-level Security for Computer Use AgentsJanuary 14, 2026 · arXiv
- Watch and Learn: Learning to Use Computers from Online VideosOctober 6, 2025 · CVPR 2026
- Scaling Agents for Computer UseOctober 2, 2025 · arXiv
- Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data CurationSeptember 28, 2025 · arXiv
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use AgentsAugust 19, 2025 · ICLR 2026 (Poster)
- Evolving in Tasks: Empowering the Multi-modality Large Language Model as the Computer Use AgentAugust 6, 2025 · arXiv
- CoAct-1: Computer-using Multi-Agent System with Coding ActionsAugust 5, 2025 · ICLR 2026 (Poster)
- LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSMay 24, 2025 · arXiv
- Improved GUI Grounding via Iterative NarrowingNovember 18, 2024 · arXiv
- Attacking Vision-Language Computer Agents via Pop-upsNovember 4, 2024 · ACL 2025
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer EnvironmentsApril 11, 2024 · NeurIPS 2024 Datasets and Benchmarks Track
- STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language ModelsJune 1, 2026 · arXiv
- UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI GroundingApril 15, 2026 · arXiv
- GUIDE: Interpretable GUI Agent Evaluation via Hierarchical DiagnosisApril 6, 2026 · arXiv
- GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI TasksMarch 26, 2026 · CVPR 2026
- Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface ElementsMarch 15, 2026 · arXiv
- M^2: Dual-Memory Augmentation for Long-Horizon Web Agents via Trajectory Summarization and Insight RetrievalFebruary 28, 2026 · arXiv
- Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal FusionFebruary 6, 2026 · arXiv
- Agent Alpha: Tree Search Unifying Generation, Exploration and Evaluation for Computer-Use AgentsFebruary 3, 2026 · arXiv
- Agentic Reward Modeling: Verifying GUI Agent via Online Proactive InteractionJanuary 31, 2026 · arXiv
- Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent EvolutionJanuary 30, 2026 · arXiv