SpecOps: A Fully Automated AI Agent Testing Framework in Real-World GUI Environments

Syed Yusuf Ahmed , Shiwei Feng , Chanwoo Bae , Calix Barrus Xiangyu Zhang

🏛 Institutions: Purdue University , University of Texas at San Antonio
📅 Date: March 10, 2026
📑 Publisher: ICSE 2026
💻 Env: Desktop Web
🔑 Keywords: testing framework bug finding specialist agents real-world evaluation multimodal testing SpecOps

TLDR

SpecOps is a fully automated testing framework that uses four specialist agents to generate cases, set up environments, execute tasks, and validate outcomes for real-world software agents. Across five deployed agents spanning CLI tools, web apps, and browser extensions, it finds 164 true bugs with 0.89 F1 while keeping each test under eight minutes and under $0.73.

Open paper arXiv Report issue