GUI Agents Papers
Star · 821

HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks

Suhana Bedi , Ryan Welch , Ethan Steinberg , Michael Wornow , Taeil Matthew Kim , Haroun Ahmed , Peter Sterling , Bravim Purohit , Qurat Akram , Angelic Acosta , Esther Nubla , Pritika Sharma , Michael A. Pfeffer , Sanmi Koyejo , Nigam H. Shah

🏛 Institutions
Stanford
📅 Date
April 10, 2026
📑 Publisher
arXiv
💻 Env
Desktop
🔑 Keywords
TLDR

HealthAdminBench evaluates computer-use agents on healthcare administration via 4 realistic GUI environments (EHR, two payer portals, fax) and 135 expert-defined tasks decomposed into 1,698 subtasks. The best agent (Claude Opus 4.6 CUA) reaches only 36.3% end-to-end despite 82.8% subtask success, exposing a large gap to real-world reliability.

Open paper arXiv Report issue
Related papers (24)