AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
Yifan Xu , Xiao Liu , Xueqiao Sun , Siyi Cheng , Hao Yu , Hanyu Lai , Shudan Zhang , Dan Zhang , Jie Tang , Yuxiao Dong
- 🏛 Institutions
- Tsinghua , PKU , Zhipu
- 📅 Date
- October 31, 2024
- 📑 Publisher
- ACL 2025
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
AndroidLab provides a reproducible Android agent environment plus a benchmark with predefined virtual devices, shared action spaces, and 138 tasks across nine apps. It also builds an Android Instruction dataset from that environment and shows that the resulting data materially improves both open LLM and VLM mobile agents.
Related papers (24)
- PSPA-Bench: A Personalized Benchmark for Smartphone GUI AgentMarch 31, 2026 · arXiv
- SecAgent: Efficient Mobile GUI Agent with Semantic ContextMarch 9, 2026 · arXiv
- Turing Test on Screen: A Benchmark for Mobile GUI Agent HumanizationFebruary 24, 2026 · arXiv
- AmbiBench: Benchmarking Mobile GUI Agents Beyond One-Shot Instructions in the WildFebruary 12, 2026 · arXiv
- MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic EnvironmentsFebruary 3, 2026 · arXiv
- SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe SynthesisJanuary 26, 2026 · arXiv
- SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy TasksJanuary 26, 2026 · ICLR 2026 (Poster)
- MobileWorldBench: Towards Semantic World Modeling For Mobile AgentsDecember 16, 2025 · arXiv
- NaturalGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory DatasetAugust 2, 2025 · arXiv
- FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM AgentsJune 9, 2025 · ICLR 2026 (Poster)
- LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration BenchmarkApril 18, 2025 · arXiv
- GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented UnderstandingJune 16, 2024 · ICLR 2025 (Poster)
- LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Task AutomationApril 12, 2024 · UIST 2024
- Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMsApril 8, 2024 · ECCV 2024 (Poster)
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsJanuary 17, 2024 · ACL 2024
- GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI NavigationNovember 13, 2023 · arXiv
- Android in the Wild: A Large-Scale Dataset for Android Device ControlJuly 19, 2023 · NeurIPS 2023 Datasets and Benchmarks Track
- A Dataset for Interactive Vision-Language Navigation with Unknown Command FeasibilityFebruary 4, 2022 · ECCV 2022
- Screen2Words: Automatic Mobile UI Summarization with Multimodal LearningAugust 6, 2021 · UIST 2021
- Widget Captioning: Generating Natural Language Description for Mobile User Interface ElementsNovember 30, 2020 · EMNLP 2020
- Mapping Natural Language Instructions to Mobile UI Action SequencesJuly 31, 2020 · ACL 2020
- WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent BenchmarkApril 13, 2026 · arXiv
- Gym-Anything: Turn any Software into an Agent EnvironmentApril 7, 2026 · arXiv
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post