META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, Kai Yu
- 🏛 Institutions
- SJTU
- 📅 Date
- May 23, 2022
- 📑 Publisher
- EMNLP 2022
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
META-GUI introduces a GUI-based task-oriented dialogue architecture in which the assistant acts on real mobile apps instead of calling task-specific backend APIs. The paper also releases a dataset for training multimodal conversational agents under that GUI-TOD setup.
Related papers
- BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking MechanismMay 27, 2025 · EMNLP 2025 (Oral)
- ReachAgent: Enhancing Mobile Agent via Page Reaching and OperationApril 30, 2025 · NAACL 2025 (Poster)
- MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationApril 30, 2025 · NAACL 2025 (System Demonstrations)
- LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration BenchmarkApril 18, 2025 · arXiv
- SheetCopilot: Bringing Software Productivity to the Next Level through Large Language ModelsMay 30, 2023 · NeurIPS 2023
- Grounding Open-Domain Instructions to Automate Web Support TasksMarch 30, 2021 · NAACL 2021