GUI Agents Papers
Star · 821

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Yue Fan , Lei Ding , Ching-Chen Kuo , Shan Jiang , Yang Zhao , Xinze Guan , Jie Yang , Yi Zhang , Xin Eric Wang

🏛 Institutions
UC Santa Cruz , eBay Inc. , Cybever
📅 Date
June 27, 2024
📑 Publisher
EMNLP 2024 (Poster)
💻 Env
General GUI
🔑 Keywords
TLDR

This paper introduces the Screen Point-and-Read task, where a model must explain the region indicated by a user point on a GUI screenshot, and proposes the Tree-of-Lens agent to solve it. It also releases the ScreenPR benchmark across mobile, web, and operating-system GUIs plus the ASHL dataset for hierarchical screen-region detection.

Open paper Report issue
Related papers (24)