GUI Agents Papers
Star · 821

SpecOps: A Fully Automated AI Agent Testing Framework in Real-World GUI Environments

Syed Yusuf Ahmed , Shiwei Feng , Chanwoo Bae , Calix Barrus Xiangyu Zhang

🏛 Institutions
Purdue University , University of Texas at San Antonio
📅 Date
March 10, 2026
📑 Publisher
ICSE 2026
💻 Env
Desktop Web
🔑 Keywords
TLDR

SpecOps is a fully automated testing framework that uses four specialist agents to generate cases, set up environments, execute tasks, and validate outcomes for real-world software agents. Across five deployed agents spanning CLI tools, web apps, and browser extensions, it finds 164 true bugs with 0.89 F1 while keeping each test under eight minutes and under $0.73.

Open paper arXiv Report issue
Related papers (24)