Agent-SAMA: State-Aware Mobile Assistant
Linqiang Guo , Wei Liu , Yi Wen Heng , Tse-Hsun Chen , Yang Wang
- 🏛 Institutions
- SPEAR Lab , Concordia University
- 📅 Date
- May 29, 2025
- 📑 Publisher
- AAAI 2026
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
Agent-SAMA addresses the reactive behavior of existing mobile agents by explicitly modeling app navigation as a finite state machine. Its four-agent framework uses that state structure for planning, verification, and recovery, improving both task success and recovery rates on cross-app mobile benchmarks.
Related papers (24)
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent ResearchMay 25, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge GraphJanuary 24, 2026 · Journal of Intelligent Computing and Networking
- GUITester: Enabling GUI Agents for Exploratory Defect DiscoveryJanuary 8, 2026 · arXiv
- Surfer 2: The Next Generation of Cross-Platform Computer Use AgentsOctober 22, 2025 · arXiv
- CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMsOctober 17, 2025 · NeurIPS 2025 (Poster)
- BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking MechanismMay 27, 2025 · EMNLP 2025 (Oral)
- GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentMay 22, 2025 · ACL 2025
- Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI AgentMay 20, 2025 · arXiv
- ReachAgent: Enhancing Mobile Agent via Page Reaching and OperationApril 30, 2025 · NAACL 2025 (Poster)
- MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationApril 30, 2025 · NAACL 2025 (System Demonstrations)
- LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration BenchmarkApril 18, 2025 · arXiv
- DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control AgentsOctober 18, 2024 · ICLR 2025 (Poster)
- ClickAgent: Enhancing UI Location Capabilities of Autonomous AgentsOctober 9, 2024 · SIGDIAL 2025
- AppAgent v2: Advanced Agent for Flexible Mobile InteractionsAugust 5, 2024 · arXiv
- MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile DevicesJuly 4, 2024 · arXiv
- AppAgent: Multimodal Agents as Smartphone UsersDecember 21, 2023 · CHI 2025
- You Only Look at Screens: Multimodal Chain-of-Action AgentsSeptember 20, 2023 · Findings of ACL 2024
- AutoDroid: LLM-powered Task Automation in AndroidAugust 29, 2023 · MobiCom 2024
- META-GUI: Towards Multi-modal Conversational Agents on Mobile GUIMay 23, 2022 · EMNLP 2022
- Interactive Task Learning from GUI-Grounded Natural Language Instructions and DemonstrationsJuly 31, 2020 · ACL 2020 Demo Track
- PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and DemonstrationsAugust 30, 2019 · UIST 2019
- SUGILITE: Creating Multimodal Smartphone Automation by DemonstrationMay 6, 2017 · CHI 2017
- VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI AutomationApril 23, 2026 · arXiv