论文信息 - Output Diversified Initialization for Adversarial Attacks - 字舞流文

Output Diversified Initialization for Adversarial Attacks

Adversarial examples are often constructed by iteratively refining a randomly perturbed input. To improve diversity and thus also the success rates of attacks, we propose Output Diversified Initialization (ODI), a novel random initialization strategy that can be combined with most existing white-box adversarial attacks. Instead of using uniform perturbations in the input space, we seek diversity in the output logits space of the target model. Empirically, we demonstrate that existing `∞ and `2 adversarial attacks with ODI become much more efficient on several datasets including MNIST, CIFAR-10 and ImageNet, reducing the accuracy of recently proposed defense models by 1–17%. Moreover, PGD attack with ODI outperforms current state-of-the-art attacks against robust models, while also being roughly 50 times faster on CIFAR-10. The code is available on https://github.com/ermongroup/ODI/.

Stefano Ermon | Yang Song | Yusuke Tashiro | S. Ermon | Yang Song | Y. Tashiro

[1] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[2] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[3] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[4] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Larry S. Davis,et al. Adversarial Training for Free! , 2019, NeurIPS.

[6] Kui Ren,et al. Distributionally Adversarial Attack , 2018, AAAI.

[7] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[8] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[10] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[11] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[12] Pushmeet Kohli,et al. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[13] Ludwig Schmidt,et al. Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[14] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[15] Matthias Hein,et al. Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Sebastian Nowozin,et al. Adversarially Robust Training through Structured Gradient Regularization , 2018, ArXiv.

[18] Baishakhi Ray,et al. Metric Learning for Adversarial Robustness , 2019, NeurIPS.

[19] Pushmeet Kohli,et al. Adversarial Robustness through Local Linearization , 2019, NeurIPS.

[20] Bin Dong,et al. You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[21] Seyed-Mohsen Moosavi-Dezfooli,et al. Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[23] Po-Sen Huang,et al. Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[24] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[25] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[26] Po-Sen Huang,et al. An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[27] Patrick D. McDaniel,et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[28] Haichao Zhang,et al. Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[29] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.