PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

Zikang Liu , Junyi Li , Wayne Xin Zhao , Dawei Gao , Yaliang Li , Ji-rong Wen

🏛 Institutions: Renmin University of China , NUS , Alibaba Group
📅 Date: October 1, 2025
📑 Publisher: arXiv
💻 Env: Mobile Web
🔑 Keywords: active look-back memory retrieval screenshot retrieval mobile navigation PAL-UI

TLDR

PAL-UI equips vision-based GUI agents with active look-back instead of relying only on truncated history or coarse textual summaries. It combines dual-level summaries with a retrieval tool for recalling specific past screenshots during planning, improving long-horizon mobile navigation and transferring to web navigation without additional training.

Open paper arXiv Report issue