论文信息 - Diversity can be Transferred: Output Diversification for White- and Black-box Attacks

Diversity can be Transferred: Output Diversification for White- and Black-box Attacks

Adversarial attacks often involve random perturbations of the inputs drawn from uniform or Gaussian distributions, e.g., to initialize optimization-based white-box attacks or generate update directions in black-box attacks. These simple perturbations, however, could be sub-optimal as they are agnostic to the model being attacked. To improve the efficiency of these attacks, we propose Output Diversified Sampling (ODS), a novel sampling strategy that attempts to maximize diversity in the target model's outputs among the generated samples. While ODS is a gradient-based strategy, the diversity offered by ODS is transferable and can be helpful for both white-box and black-box attacks via surrogate models. Empirically, we demonstrate that ODS significantly improves the performance of existing white-box and black-box attacks. In particular, ODS reduces the number of queries needed for state-of-the-art black-box attacks on ImageNet by a factor of two.

[1] Cho-Jui Hsieh,et al. Sign-OPT: A Query-Efficient Hard-label Adversarial Attack , 2020, ICLR.

[2] Yuan Tian,et al. Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries , 2020, USENIX Security Symposium.

[3] Po-Sen Huang,et al. An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[4] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[5] Matthias Bethge,et al. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[6] Kun He,et al. Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks , 2019, ICLR.

[7] Patrick D. McDaniel,et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[8] Yiwen Guo,et al. Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks , 2019, NeurIPS.

[9] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[10] Tong Zhang,et al. Black-Box Adversarial Attack with Transferable Model-based Embedding , 2020, ICLR.

[11] Haichao Zhang,et al. Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[12] Larry S. Davis,et al. Adversarial Training for Free! , 2019, NeurIPS.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Pushmeet Kohli,et al. Adversarial Robustness through Local Linearization , 2019, NeurIPS.

[15] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Nicolas Flammarion,et al. Square Attack: a query-efficient black-box adversarial attack via random search , 2020, ECCV.

[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[20] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[21] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[22] Yizheng Chen,et al. Enhancing Gradient-based Attacks with Symbolic Intervals , 2019, ArXiv.

[23] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[24] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[26] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[27] Andrew Gordon Wilson,et al. Simple Black-box Adversarial Attacks , 2019, ICML.

[28] Ludwig Schmidt,et al. Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[29] Alois Knoll,et al. Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .