GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge Graph
Mingxian Yu , Siqi Luo , Xu Chen
- 🏛 Institutions
- Sun Yat-sen University
- 📅 Date
- January 24, 2026
- 📑 Publisher
- Journal of Intelligent Computing and Networking
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
GraphPilot builds app-specific knowledge graphs that encode page functions, element roles, and transition rules, then uses them to plan nearly complete action sequences in almost one LLM query. On DroidTask it improves task completion while sharply reducing latency and the number of LLM calls relative to stepwise mobile agents.
Related papers (24)
- UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI AgentsMay 28, 2026 · arXiv
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent ResearchMay 25, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- GUITester: Enabling GUI Agents for Exploratory Defect DiscoveryJanuary 8, 2026 · arXiv
- Surfer 2: The Next Generation of Cross-Platform Computer Use AgentsOctober 22, 2025 · arXiv
- CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMsOctober 17, 2025 · NeurIPS 2025 (Poster)
- Agent-SAMA: State-Aware Mobile AssistantMay 29, 2025 · AAAI 2026
- BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking MechanismMay 27, 2025 · EMNLP 2025 (Oral)
- GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI AgentMay 22, 2025 · ACL 2025
- Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI AgentMay 20, 2025 · arXiv
- ReachAgent: Enhancing Mobile Agent via Page Reaching and OperationApril 30, 2025 · NAACL 2025 (Poster)
- MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationApril 30, 2025 · NAACL 2025 (System Demonstrations)
- LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration BenchmarkApril 18, 2025 · arXiv
- DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control AgentsOctober 18, 2024 · ICLR 2025 (Poster)
- ClickAgent: Enhancing UI Location Capabilities of Autonomous AgentsOctober 9, 2024 · SIGDIAL 2025
- AppAgent v2: Advanced Agent for Flexible Mobile InteractionsAugust 5, 2024 · arXiv
- MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile DevicesJuly 4, 2024 · arXiv
- AppAgent: Multimodal Agents as Smartphone UsersDecember 21, 2023 · CHI 2025
- You Only Look at Screens: Multimodal Chain-of-Action AgentsSeptember 20, 2023 · Findings of ACL 2024
- AutoDroid: LLM-powered Task Automation in AndroidAugust 29, 2023 · MobiCom 2024
- META-GUI: Towards Multi-modal Conversational Agents on Mobile GUIMay 23, 2022 · EMNLP 2022
- Interactive Task Learning from GUI-Grounded Natural Language Instructions and DemonstrationsJuly 31, 2020 · ACL 2020 Demo Track
- PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and DemonstrationsAugust 30, 2019 · UIST 2019
- SUGILITE: Creating Multimodal Smartphone Automation by DemonstrationMay 6, 2017 · CHI 2017