GUI Agents Papers
Star · 821

VGA: Vision GUI Assistant - Minimizing Hallucinations through Image-Centric Fine-Tuning

Ziyang Meng , Yu Dai , Zezheng Gong , Shaoxiong Guo , Minglong Tang , Tongquan Wei

🏛 Institutions
East China Normal University
📅 Date
June 20, 2024
📑 Publisher
Findings of EMNLP 2024
💻 Env
General GUI
🔑 Keywords
TLDR

VGA is a GUI-understanding model fine-tuned to reduce hallucinations caused by relying on textual priors instead of screen evidence. The paper builds a 63.8k GUI VQA dataset with the Referent Method and uses a two-stage Foundation-and-Advanced-Comprehension training scheme to improve visually grounded answers.

Open paper Report issue
Related papers (24)