GUI Agents Papers
Star · 751

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

Lai Wei, Liangbo He, Jun Lan, Lingzhong Dong, Yutong Cai, Siyuan Li, Huijia Zhu, Weiqiang Wang, Linghe Kong, Yue Wang, Zhuosheng Zhang, Weiran Huang

🏛 Institutions
SJTU, Ant Group, Zhongguancun Academy, Shanghai Innovation Institute
📅 Date
February 12, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

This paper proposes Region-to-Image Distillation, which teaches a model to internalize zoom-in behavior without requiring explicit crop-and-reason inference at test time. It also introduces ZoomBench and shows stronger fine-grained perception on both perception and GUI-agent benchmarks.

Open paper arXiv Edit on GitHub Report issue
Related papers