GUI Agents Papers
Star · 751

Trifuse: Enhancing Attention-Based GUI Grounding via Multimodal Fusion

Longhui Ma, Di Zhao, Siwei Wang, Zhao Lv, Miao Wang

🏛 Institutions
National University of Defense Technology, Academy of Military Sciences
📅 Date
February 6, 2026
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

Trifuse is a training-free GUI grounding method that fuses MLLM attention, OCR text, and icon caption signals through a consensus-based fusion strategy. It improves grounding across multiple benchmarks without task-specific fine-tuning.

Open paper arXiv Edit on GitHub Report issue
Related papers