RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

Suyu Ye , Haojun Shi , Darren Shih , Hyokun Yun , Tanya G. Roosta , Tianmin Shu

🏛 Institutions: JHU , Amazon
📅 Date: April 14, 2025
📑 Publisher: AAAI 2026
💻 Env: Web
🔑 Keywords: benchmark dataset long-horizon assistance sequential instructions ambiguous user intent user routines RealWebAssist

TLDR

RealWebAssist benchmarks long-horizon web assistance with sequential instructions collected from real users rather than isolated single-task prompts. Its dataset spans 1,885 instructions across 107 tasks on 66 websites and highlights challenges such as ambiguous intent, evolving user goals, routine understanding, and grounding actions to the right GUI elements.

Open paper arXiv Report issue