GUI Agents Papers
Star · 751

Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements

Ziwei Liu, Tao Feng, Borui Kang, Yanbing Yang, Jun Luo

🏛 Institutions
Sichuan University, Tsinghua, NJU, NTU
📅 Date
March 15, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

ZoomUI is a training-free GUI grounding method built on the idea that complex interfaces can be decomposed into simpler visual elements that generic MLLMs already understand. It rewrites instructions into element-level visual descriptions and progressively zooms onto candidate UI regions, reaching or surpassing fine-tuned baselines without additional training.

Open paper arXiv Edit on GitHub Report issue
Related papers