EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents
Ying Mo, Yu Bai, Dapeng Sun, Yuqian Shi, Yukai Miao, Li Chen, Dan Li
- 🏛 Institutions
- Zhongguancun Laboratory, Tsinghua
- 📅 Date
- January 25, 2026
- 📑 Publisher
- arXiv
- 💻 Env
- Desktop
- 🔑 Keywords
TLDR
EntWorld introduces a verifiable enterprise-agent environment and a 1,756-task benchmark spanning six business domains such as CRM, ITIL, and ERP. It synthesizes workflows from database schemas and uses SQL-based deterministic verification instead of visual matching, and current top models still trail human performance by a large margin.
Related papers
- ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific WorkflowsMay 26, 2025 · ICLR 2026 (Poster)
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?March 11, 2024 · ICML 2024
- WebArena: A Realistic Web Environment for Building Autonomous AgentsJuly 25, 2023 · NeurIPS 2024 (Oral)
- WebShop: Towards Scalable Real-World Web Interaction with Grounded Language AgentsJuly 31, 2022 · NeurIPS 2022
- WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application EnvironmentsApril 30, 2026 · arXiv
- The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use AgentsApril 12, 2026 · arXiv