Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems

Maria Movin , Claudia Hauff , Aron Henriksson , Panagiotis Papapetrou

🏛 Institutions: Stockholm University , Spotify
📅 Date: April 9, 2026
📑 Publisher: arXiv
💻 Env: Web
🔑 Keywords: evaluation trace-level analysis user simulation search systems behavioral alignment

TLDR

This paper presents a trace-level evaluation framework comparing human and GUI-agent behavior across task outcome, query formulation, and navigation in a production audio-streaming search application. With 39 participants and a state-of-the-art GUI agent on 10 multi-hop search tasks, the agent matches task success but follows search-centric, low-branching strategies versus humans' content-centric exploration.

Open paper arXiv Report issue