PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Yanheng He , Jiahe Jin , Shijie Xia , Jiadi Su , Runze Fan , Haoyang Zou , Xiangkun Hu , Pengfei Liu
- 🏛 Institutions
- SJTU , GAIR
- 📅 Date
- December 23, 2024
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
PC Agent studies how to transfer human cognitive processes into desktop agents for complex digital work rather than short isolated tasks. It introduces PC Tracker for collecting cognitive interaction traces, a two-stage cognition-completion pipeline, and a planning-plus-grounding multi-agent system, showing promising results on long PowerPoint workflows with limited data.
Related papers (24)
- AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer AssistantOctober 24, 2024 · Findings of ACL 2025
- VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI AutomationApril 23, 2026 · arXiv
- EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience LearningApril 10, 2026 · arXiv
- ShowUI-Aloha: Human-Taught GUI AgentJanuary 12, 2026 · arXiv
- Surfer 2: The Next Generation of Cross-Platform Computer Use AgentsOctober 22, 2025 · arXiv
- BIMgent: Towards Autonomous Building Modeling via Computer-use AgentsJune 8, 2025 · ICML 2025 Workshop on Computer-use Agents
- LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSMay 24, 2025 · arXiv
- UFO2: The Desktop AgentOSApril 20, 2025 · arXiv
- WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting PointFebruary 12, 2025 · arXiv
- Agent S: An Open Agentic Framework that Uses Computers Like a HumanOctober 10, 2024 · ICLR 2025 (Poster)
- AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based AgentsSeptember 26, 2024 · ACL 2025
- OS-Copilot: Towards Generalist Computer Agents with Self-ImprovementFebruary 12, 2024 · LLMAgents @ ICLR 2024
- Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer ControlJune 13, 2023 · ICLR 2024 (Poster)
- MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent ResearchMay 25, 2026 · arXiv
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI AgentsApril 13, 2026 · arXiv
- OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI GroundingFebruary 25, 2026 · arXiv
- M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data MiningFebruary 5, 2026 · ICLR 2026 (Poster)
- LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI AgentJanuary 26, 2026 · ICLR 2026 (Poster)
- GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge GraphJanuary 24, 2026 · Journal of Intelligent Computing and Networking
- MagicGUI-RMS: A Multi-Agent Reward Model System for Self-Evolving GUI Agents via Automated Feedback RefluxJanuary 19, 2026 · arXiv
- ColorBrowserAgent: Complex Long-Horizon Browser Agent with Adaptive Knowledge EvolutionJanuary 12, 2026 · arXiv
- GUITester: Enabling GUI Agents for Exploratory Defect DiscoveryJanuary 8, 2026 · arXiv
- WebATLAS: An LLM Agent with Experience-Driven Memory and Action SimulationOctober 26, 2025 · NeurIPS 2025 Workshop on Language Agents and World Models
- PolySkill: Learning Generalizable Skills Through Polymorphic AbstractionOctober 17, 2025 · arXiv