MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

Guangyi Liu , Pengxiang Zhao , Yaozhen Liang , Qinyi Luo , Shunye Tang , Yuxiang Chai , Weifeng Lin , Han Xiao , WenHao Wang , Siheng Chen , Zhengxi Lu , Gao Wu , Hao Wang , Liang Liu , Yong Liu

🏛 Institutions: ZJU , Nankai University , CUHK , SJTU , vivo AI Lab
📅 Date: February 3, 2026
📑 Publisher: arXiv
💻 Env: Mobile
🔑 Keywords: benchmark dataset memory evaluation MemGUI-Eval MemGUI-Bench

TLDR

MemGUI-Bench is a memory-focused benchmark for mobile GUI agents covering dynamic tasks that require cross-temporal and cross-spatial retention. Paired with MemGUI-Eval, it reveals large hidden memory deficits in current agents that standard benchmarks miss.

Open paper arXiv Report issue

Related papers (24)

PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

March 31, 2026 · arXiv
SecAgent: Efficient Mobile GUI Agent with Semantic Context

March 9, 2026 · arXiv
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

February 24, 2026 · arXiv
AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the Wild

February 12, 2026 · arXiv
VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics

February 6, 2026 · arXiv
SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis

January 26, 2026 · arXiv
SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks

January 26, 2026 · ICLR 2026 (Poster)
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents

December 16, 2025 · arXiv
NaturalGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset

August 2, 2025 · arXiv
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents

June 9, 2025 · ICLR 2026 (Poster)
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation

April 30, 2025 · NAACL 2025 (System Demonstrations)
LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark

April 18, 2025 · arXiv
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

October 31, 2024 · ACL 2025
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

June 16, 2024 · ICLR 2025 (Poster)
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task Automation

April 12, 2024 · UIST 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

April 8, 2024 · ECCV 2024 (Poster)
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

January 17, 2024 · ACL 2024
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

November 13, 2023 · arXiv
Android in the Wild: A Large-Scale Dataset for Android Device Control

July 19, 2023 · NeurIPS 2023 Datasets and Benchmarks Track
A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

February 4, 2022 · ECCV 2022
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

August 6, 2021 · UIST 2021
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements

November 30, 2020 · EMNLP 2020
Mapping Natural Language Instructions to Mobile UI Action Sequences

July 31, 2020 · ACL 2020
WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

April 13, 2026 · arXiv