GUI Agents Papers
Star · 821

Cross-Modal Content Optimization for Steering Web Agent Preferences

Tanqiu Jiang , Min Bai , Nikolaos Pappas , Yanjun Qi , Sandesh Swamy

🏛 Institutions
Stony Brook University , AWS AI Labs
📅 Date
October 4, 2025
📑 Publisher
arXiv
💻 Env
Web
🔑 Keywords
TLDR

This paper introduces Cross-Modal Preference Steering (CPS), a black-box attack that jointly perturbs an item's image and text to bias web-agent ranking and selection decisions. Under a realistic threat model where the attacker controls only their own listing metadata, CPS outperforms prior baselines across GPT-4.1, Qwen-2.5VL, and Pixtral-Large while keeping detection rates much lower.

Open paper arXiv Report issue
Related papers (24)