Zoom to Essence: Trainless GUI Grounding by Inferring upon Interface Elements

Ziwei Liu , Tao Feng , Borui Kang , Yanbing Yang , Jun Luo

🏛 Institutions: Sichuan University , Tsinghua , NJU , NTU
📅 Date: March 15, 2026
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: GUI grounding training-free inference scaling ZoomUI instruction rewriting progressive zooming

TLDR

ZoomUI is a training-free GUI grounding method built on the idea that complex interfaces can be decomposed into simpler visual elements that generic MLLMs already understand. It rewrites instructions into element-level visual descriptions and progressively zooms onto candidate UI regions, reaching or surpassing fine-tuned baselines without additional training.

Open paper arXiv Report issue