ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

Difei Gao , Lei Ji , Zechen Bai , Mingyu Ouyang , Peiran Li , Dongxing Mao , Qinchen Wu , Weichen Zhang , Peiyi Wang , Xiangwu Guo , Hengxu Wang , Luowei Zhou , Mike Zheng Shou

🏛 Institutions: Show Lab , NUS
📅 Date: December 20, 2023
📑 Publisher: CVPR 2024 (Poster)
💻 Env: Desktop
🔑 Keywords: benchmark AssistGUI desktop automation GUI parser actor-critic agent Windows productivity

TLDR

AssistGUI introduces a Windows desktop benchmark of 100 tasks across nine software applications, each paired with project files for evaluation. The paper also proposes an actor-critic agent with an LLM-driven GUI parser and reports that the best model still reaches only 46% success.

Open paper arXiv Report issue

Related papers (24)

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

June 9, 2026 · arXiv
WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments

April 30, 2026 · arXiv
The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

April 12, 2026 · arXiv
HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks

April 10, 2026 · arXiv
Gym-Anything: Turn any Software into an Agent Environment

April 7, 2026 · arXiv
HippoCamp: Benchmarking Contextual Agents on Personal Computers

April 1, 2026 · arXiv
PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents

March 9, 2026 · arXiv
OSExpert: Computer-Use Agents Learning Professional Skills via Exploration

March 9, 2026 · arXiv
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents

February 9, 2026 · arXiv
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents

February 9, 2026 · arXiv
ANCHOR: Branch-Point Data Generation for GUI Agents

February 6, 2026 · arXiv
OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive Tasks

January 28, 2026 · arXiv
CUA-Skill: Develop Skills for Computer Using Agent

January 28, 2026 · arXiv
EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents

January 25, 2026 · arXiv
MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction

January 19, 2026 · arXiv
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

December 31, 2025 · arXiv
VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks

December 18, 2025 · arXiv
OS-Oracle: A Comprehensive Framework for Cross-Platform GUI Critic Models

December 18, 2025 · arXiv
Using GUI Agent for Electronic Design Automation

December 12, 2025 · arXiv
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents

November 6, 2025 · arXiv
CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent

October 21, 2025 · arXiv
NaturalGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset

August 2, 2025 · arXiv
MCPWorld: A Unified Benchmarking Testbed for API, GUI, and Hybrid Computer Use Agents

June 9, 2025 · arXiv
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents

June 4, 2025 · NeurIPS 2025 (Poster)