A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
Hans G.W. van Dam
- 🏛 Institutions
- uxx.ai
- 📅 Date
- August 31, 2025
- 📑 Publisher
- arXiv
- 💻 Env
- General GUI
- 🔑 Keywords
TLDR
This paper proposes an MCP-driven GUI architecture that lets existing applications expose navigation structure and action semantics to speech-enabled assistants through ViewModels and a GUI tree router. The design targets multimodal interaction with aligned spoken and visual feedback, and the paper also reports a small evaluation of locally deployable open-weight models for this setting.
Related papers
- LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial ScenariosFebruary 3, 2026 · arXiv
- EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience LearningApril 10, 2026 · arXiv
- MAI-UI Technical Report: Real-World Centric Foundation GUI AgentsDecember 26, 2025 · arXiv
- MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented EnvironmentsDecember 22, 2025 · arXiv