GUITester: Enabling GUI Agents for Exploratory Defect Discovery

Yifei Gao , Jiang Wu , Xiaoyi Chen , Yifan Yang , Zhe Cui , Tianyi Ma , Jiaming Zhang , Jitao Sang

🏛 Institutions: Beijing Jiaotong University , Hithink Research , NTU
📅 Date: January 8, 2026
📑 Publisher: arXiv
💻 Env: Mobile
🔑 Keywords: framework benchmark GUI testing defect discovery GUITestBench GUITester

TLDR

GUITester targets exploratory defect discovery in mobile apps, where agents must both navigate and recognize that anomalous behavior is a product defect rather than their own mistake. It introduces GUITestBench with 143 tasks across 26 defects and a multi-agent framework that separates planning-execution from hierarchical reflection, reaching 48.90% F1 (Pass@3).

Open paper arXiv Report issue