GUI Agents Papers
Star · 751

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Zhiqi Ge, Juncheng Li, Xinglei Pang, Minghe Gao, Kaihang Pan, Wang Lin, Hao Fei, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

🏛 Institutions
ZJU, NUS
📅 Date
December 13, 2024
📑 Publisher
arXiv
💻 Env
Desktop Web
🔑 Keywords
TLDR

Iris targets the visual-perception bottleneck of GUI agents in high-resolution, visually complex interfaces. It combines information-sensitive cropping with a self-refining dual-learning loop between referring and grounding, and the resulting gains transfer to both web and OS downstream tasks.

Open paper arXiv Edit on GitHub Report issue
Related papers