AppAgent: Multimodal Agents as Smartphone Users
Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu
- 🏛 Institutions
- Tencent
- 📅 Date
- December 21, 2023
- 📑 Publisher
- CHI 2025
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
AppAgent is a smartphone-use agent that operates through a simple tap-and-swipe action space without backend app access. It learns app usage through autonomous exploration or human demonstrations, stores that knowledge in a reference document, and is evaluated on 50 tasks across 10 apps.
Related papers
- GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentMay 22, 2025 · ACL 2025
- AppAgent v2: Advanced Agent for Flexible Mobile InteractionsAugust 5, 2024 · arXiv
- AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human DemonstrationsNovember 24, 2024 · ACL 2025
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge GraphJanuary 24, 2026 · Journal of Intelligent Computing and Networking
- GUITester: Enabling GUI Agents for Exploratory Defect DiscoveryJanuary 8, 2026 · arXiv