Poster
in
Workshop: Safe Generative AI
Designing Physical-World Universal Attacks on Vision Transformers
Mingzhen Shao
Recent studies have highlighted the vulnerability of Vision Transformers (ViTs) to adversarial attacks. However, existing attack methods often overlook the differences between ViTs and CNNs, resulting in difficulties when transitioning attacks from the digital to the physical world. In this work, we introduce a novel adversarial patch generating method, presenting the first physical-world universal attack for ViTs (G-Patch). Unlike previous methods, our approach decouples the relationship between attacker location and ViT patches, enabling the model to design attacks that can occur at random locations in the physical world. To provide a capable learning ability for this more complex situation, we employ a sub-network to craft potential attackers. Our ablation study demonstrates that the previous direct optimization method fails to provide a reliable attack when considering random locations. Our synthetic tests simulate various types of physical-world noise, with G-Patch achieving a targeted attack success rate (ASR) of over 90\%, while other approaches exhibit a negligible ASR of less than 10\%. Additionally, a black-box attack is designed to demonstrate G-Patch's transferability across different models. A series of challenging physical-world experiments further underscore its robustness in practical deployments.