Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Junhong Shen , Hao Bai , Lunjun Zhang , Yifei Zhou , Amrith Setlur , Shengbang Tong , Diego Caples , Nan Jiang , Tong Zhang , Ameet Talwalkar , Aviral Kumar

🏛 Institutions: CMU , Scribe , UIUC , University of Toronto , UC Berkeley , The AGI Company , New York University
📅 Date: June 9, 2025
📑 Publisher: SEA @ NeurIPS 2025 (Oral)
💻 Env: Web
🔑 Keywords: reinforcement learning test-time interaction interaction scaling exploration backtracking TTI

TLDR

This paper argues that interactive web agents benefit more from scaling how long they can interact with the environment than from merely lengthening pre-action reasoning traces. It introduces Test-Time Interaction (TTI), an online RL method that increases rollout horizons and yields stronger WebVoyager and WebArena agents with richer exploration and replanning behavior.

Open paper arXiv Report issue