GUI Agents Papers
Star · 821

Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

Jiashu Yao , Heyan Huang , Daiqing Wu , Wangke Chen , Huaxi Ai , Haoyu Wen , Zeming Liu , Yuhang Guo

🏛 Institutions
BIT , THU , Beihang
📅 Date
June 3, 2026
📑 Publisher
arXiv
💻 Env
Mobile
🔑 Keywords
TLDR

LivingScreen is a benchmark for "living-screen-native" GUI agents that must act on continuously updating interfaces such as short-video platforms, where on-screen content changes between agent actions. Evaluating frontier models, it finds none reaches human cost-accuracy performance and identifies over- and under-observation as key failure modes, motivating better observation-control capabilities.

Open paper arXiv Report issue
Related papers (24)