A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants

🏛 Institutions: uxx.ai
📅 Date: August 31, 2025
📑 Publisher: arXiv
💻 Env: General GUI
🔑 Keywords: MCP MVVM GUI tree router speech-enabled assistants voice accessibility

TLDR

This paper proposes an MCP-driven GUI architecture that lets existing applications expose navigation structure and action semantics to speech-enabled assistants through ViewModels and a GUI tree router. The design targets multimodal interaction with aligned spoken and visual feedback, and the paper also reports a small evaluation of locally deployable open-weight models for this setting.

Open paper arXiv Report issue