GUI Agents Papers
Star · 751

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

🏛 Institutions
UC Santa Cruz, eBay Inc., Cybever
📅 Date
June 27, 2024
📑 Publisher
EMNLP 2024 (Poster)
💻 Env
General GUI
🔑 Keywords
TLDR

This paper introduces the Screen Point-and-Read task, where a model must explain the region indicated by a user point on a GUI screenshot, and proposes the Tree-of-Lens agent to solve it. It also releases the ScreenPR benchmark across mobile, web, and operating-system GUIs plus the ASHL dataset for hierarchical screen-region detection.

Open paper Edit on GitHub Report issue
Related papers