GUI Agents Papers
Star · 821

Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements

Ziwei Liu , Tao Feng , Borui Kang , Yanbing Yang , Jun Luo

🏛 Institutions
Sichuan University , Tsinghua , NJU , NTU
📅 Date
March 15, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

ZoomUI is a training-free GUI grounding method built on the idea that complex interfaces can be decomposed into simpler visual elements that generic MLLMs already understand. It rewrites instructions into element-level visual descriptions and progressively zooms onto candidate UI regions, reaching or surpassing fine-tuned baselines without additional training.

Open paper arXiv Report issue
Related papers (24)