GUI Agents Papers
Star · 821

SCUBA: Salesforce Computer Use Benchmark

Yutong Dai , Krithika Ramakrishnan , Jing Gu , Matthew Fernandez , Yanqi Luo , Viraj Prabhu , Zhenyu Hu , Silvio Savarese , Caiming Xiong , Zeyuan Chen , Ran Xu

🏛 Institutions
Salesforce AI Research
📅 Date
September 30, 2025
📑 Publisher
ICLR 2026 (Poster)
💻 Env
General GUI
🔑 Keywords
TLDR

SCUBA is a benchmark for computer-use agents on Salesforce customer-relationship-management workflows, with 300 task instances derived from real user interviews across administrator, sales, and service personas. It runs in Salesforce sandbox environments with interpretable milestone evaluation and shows that enterprise tasks remain much harder than standard CUA benchmarks, especially for open models in zero-shot settings.

Open paper Report issue
Related papers (24)