论文信息 - Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets - 字舞流文

Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets

Adversarial training is by far the most successful strategy for improving robustness of neural networks to adversarial attacks. Despite its success as a defense mechanism, adversarial training fails to generalize well to unperturbed test set. We hypothesize that this poor generalization is a consequence of adversarial training with uniform perturbation radius around every training sample. Samples close to decision boundary can be morphed into a different class under a small perturbation budget, and enforcing large margins around these samples produce poor decision boundaries that generalize poorly. Motivated by this hypothesis, we propose instance adaptive adversarial training -- a technique that enforces sample-specific perturbation margins around every training sample. We show that using our approach, test accuracy on unperturbed samples improve with a marginal drop in robustness. Extensive experiments on CIFAR-10, CIFAR-100 and Imagenet datasets demonstrate the effectiveness of our proposed approach.

Tom Goldstein | Judy Hoffman | Yogesh Balaji | Judy Hoffman | T. Goldstein | Y. Balaji

[1] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[3] Hamza Fawzi,et al. Adversarial vulnerability for any classifier , 2018, NeurIPS.

[4] Tom Goldstein,et al. Are adversarial examples inevitable? , 2018, ICLR.

[5] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[6] J. Zico Kolter,et al. Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[7] Mingyan Liu,et al. Spatially Transformed Adversarial Examples , 2018, ICLR.

[8] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[10] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[12] Yoshua Bengio,et al. Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy , 2019, ArXiv.

[13] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[14] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[15] Colin Raffel,et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[16] Ruitong Huang,et al. Max-Margin Adversarial (MMA) Training: Direct Input Space Margin Maximization through Adversarial Training , 2018, ICLR.

[17] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[18] Yoshua Bengio,et al. Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy , 2019, AISec@CCS.

[19] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Kamyar Azizzadenesheli,et al. Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[21] Saeed Mahloujifar,et al. The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure , 2018, AAAI.