GUI Agents Papers
Star · 821

You Don’t Know Until You Click: Automated GUI Testing for Production-Ready Software Evaluation

Yutong Bian , Xianhao Lin , Yupeng Xie , Tianyang Liu , Mingchen Zhuge , Siyuan Lu , Haoming Tang , Jinlin Wang , Jiayi Zhang , Jiaqi Chen , Xiangru Tang , Yongxin Ni , Sirui Hong , Chenglin Wu

🏛 Institutions
DeepWisdom , Fudan , HKUST(GZ) , UC San Diego , KAUST , Westlake University , Stanford , Yale University , NUS
📅 Date
August 17, 2025
📑 Publisher
SEA @ NeurIPS 2025 (Poster)
💻 Env
General GUI
🔑 Keywords
TLDR

RealDevWorld is an evaluation framework for repository-scale software generation that judges whether produced applications actually work when interacted with through their GUIs. It pairs a 194-task benchmark, RealDevBench, with AppEvalPilot, an agent-as-a-judge system for functional, visual, and runtime evaluation, and reports strong alignment with expert human assessments.

Open paper arXiv Report issue
Related papers (24)