Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar
- 🏛 Institutions
- CMU, Scribe, UIUC, University of Toronto, UC Berkeley, The AGI Company, New York University
- 📅 Date
- June 9, 2025
- 📑 Publisher
- SEA @ NeurIPS 2025 (Oral)
- 💻 Env
- Web
- 🔑 Keywords
TLDR
This paper argues that interactive web agents benefit more from scaling how long they can interact with the environment than from merely lengthening pre-action reasoning traces. It introduces Test-Time Interaction (TTI), an online RL method that increases rollout horizons and yields stronger WebVoyager and WebArena agents with richer exploration and replanning behavior.
Related papers
- WebArena-Infinity: Generating Browser Environments with Verifiable Tasks at ScaleMarch 2026 · Blog Post
- WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web AgentsMarch 5, 2026 · arXiv
- OpAgent: Operator Agent for Web NavigationFebruary 14, 2026 · arXiv
- WebGym: Scaling Training Environments for Visual Web Agents with Realistic TasksJanuary 5, 2026 · arXiv
- WebOperator: Action-Aware Tree Search for Autonomous Agents in Web EnvironmentDecember 14, 2025 · arXiv
- WebServ: A Browser-Server Environment for Efficient Training of Reinforcement Learning-based Web Agents at ScaleOctober 17, 2025 · arXiv