META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI

Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, Kai Yu

🏛 Institutions: SJTU
📅 Date: May 23, 2022
📑 Publisher: EMNLP 2022
💻 Env: Mobile
🔑 Keywords: framework dataset task-oriented dialogue GUI-TOD META-GUI

TLDR

META-GUI introduces a GUI-based task-oriented dialogue architecture in which the assistant acts on real mobile apps instead of calling task-specific backend APIs. The paper also releases a dataset for training multimodal conversational agents under that GUI-TOD setup.

Open paper Edit on GitHub Report issue

Related papers

BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism

May 27, 2025 · EMNLP 2025 (Oral)
ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

April 30, 2025 · NAACL 2025 (Poster)
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation

April 30, 2025 · NAACL 2025 (System Demonstrations)
LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark

April 18, 2025 · arXiv
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

May 30, 2023 · NeurIPS 2023
Grounding Open-Domain Instructions to Automate Web Support Tasks

March 30, 2021 · NAACL 2021