Skip to yearly menu bar Skip to main content


Poster

Enhancing Transferable Adversarial Attacks on Vision Transformers through Gradient Normalization Scaling and High-Frequency Adaptation

Zhiyu Zhu · Xinyi Wang · Zhibo Jin · Jiayu Zhang · Huaming Chen

Halle B
[ ]
Wed 8 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

At present, various variants of Vision Transformers (ViTs) models have been widely applied in fields such as computer vision, natural language processing, and cross-modal applications. A primary rationale behind this is that applying gradient propagation and gradient regularization across different functional regions in the transformer structure can enhance the transferability of adversarial samples. However, in practice, substantial gradient disparities exist even within the same functional region across different layers. In this paper, we introduce a novel Gradient Normalization Scaling method for fine-grained gradient editing to enhance the transferability of adversarial attacks on ViTs. More importantly, we highlight that ViTs, unlike conventional CNNs, exhibit distinct attention points in the frequency domain. Leveraging this insight, we delve into exploring frequency domain to further enhance the algorithm's transferability. Through extensive experimentation on various ViT variants and traditional CNN models, we substantiate that the new approach achieves state-of-the-art performance, with an average performance improvement of 33.54\% and 42.05\% on ViT and CNN models, respectively. Our code is available at: https://anonymous.4open.science/r/GNS-HFE-DD2D/.

Chat is not available.