GUI Agents Papers
Star · 751

A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants

Hans G.W. van Dam

🏛 Institutions
uxx.ai
📅 Date
August 31, 2025
📑 Publisher
arXiv
💻 Env
General GUI
🔑 Keywords
TLDR

This paper proposes an MCP-driven GUI architecture that lets existing applications expose navigation structure and action semantics to speech-enabled assistants through ViewModels and a GUI tree router. The design targets multimodal interaction with aligned spoken and visual feedback, and the paper also reports a small evaluation of locally deployable open-weight models for this setting.

Open paper arXiv Edit on GitHub Report issue
Related papers