AFRAgent : An Adaptive Feature Renormalization Based High Resolution Aware GUI agent
Neeraj Anand, Rishabh Jain, Sohan Patnaik, Balaji Krishnamurthy, Mausoom Sarkar
- 🏛 Institutions
- Adobe
- 📅 Date
- November 30, 2025
- 📑 Publisher
- WACV 2026
- 💻 Env
- Mobile
- 🔑 Keywords
TLDR
AFRAgent targets the loss of spatial detail that hurts mobile GUI automation models built on low-resolution vision features. It adds an adaptive feature renormalization module to enrich instruct-BLIP image embeddings with high-resolution information, and reports state-of-the-art results on Meta-GUI and AITW with a much smaller model than prior baselines.
Related papers
- UI-TARS: Pioneering Automated GUI Interaction with Native AgentsJanuary 21, 2025 · arXiv
- Ponder & Press: Advancing Visual GUI Agent towards General Computer ControlDecember 2, 2024 · Findings of ACL 2025
- OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsOctober 30, 2024 · ICLR 2025 (Spotlight)
- UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time GroundingJuly 29, 2025 · CVPR 2026 Findings
- Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow SystemsMarch 9, 2025 · arXiv
- Aria-UI: Visual Grounding for GUI InstructionsDecember 20, 2024 · Findings of ACL 2025