GUI Agents Papers
Star · 821

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Yujia Qin , Yining Ye , Junjie Fang , Haoming Wang , Shihao Liang , Shizuo Tian , Junda Zhang , Jiahao Li , Yunxin Li , Shijue Huang , Wanjun Zhong , Kuanye Li , Jiale Yang , Yu Miao , Woyu Lin , Longxiang Liu , Xu Jiang , Qianli Ma , Jingyu Li , Xiaojun Xiao , Kai Cai , Chuang Li , Yaowei Zheng , Chaolin Jin , Chen Li , Xiao Zhou , Minchao Wang , Haoli Chen , Zhaojian Li , Haihua Yang , Haifeng Liu , Feng Lin , Tao Peng , Xin Liu , Guang Shi

🏛 Institutions
ByteDance Seed , Tsinghua
📅 Date
January 21, 2025
📑 Publisher
arXiv
💻 Env
Desktop Mobile Web
🔑 Keywords
TLDR

UI-TARS is an end-to-end GUI agent model that acts directly from screenshots instead of relying on wrapper-style prompting workflows around proprietary models. It combines enhanced perception, unified cross-platform action modeling, deliberate multi-step reasoning, and iterative training on reflective online traces, and reports strong performance across ten-plus GUI benchmarks.

Open paper arXiv Report issue
Related papers (24)