Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets

Adversarial training is by far the most successful strategy for improving robustness of neural networks to adversarial attacks. Despite its success as a defense mechanism, adversarial training fails to generalize well to unperturbed test set. We hypothesize that this poor generalization is a consequence of adversarial training with uniform perturbation radius around every training sample. Samples close to decision boundary can be morphed into a different class under a small perturbation budget, and enforcing large margins around these samples produce poor decision boundaries that generalize poorly. Motivated by this hypothesis, we propose instance adaptive adversarial training -- a technique that enforces sample-specific perturbation margins around every training sample. We show that using our approach, test accuracy on unperturbed samples improve with a marginal drop in robustness. Extensive experiments on CIFAR-10, CIFAR-100 and Imagenet datasets demonstrate the effectiveness of our proposed approach.

[1]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[3]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[4]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[5]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[6]  J. Zico Kolter,et al.  Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[7]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[10]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[12]  Yoshua Bengio,et al.  Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Accuracy , 2019, ArXiv.

[13]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[14]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[15]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[16]  Ruitong Huang,et al.  Max-Margin Adversarial (MMA) Training: Direct Input Space Margin Maximization through Adversarial Training , 2018, ICLR.

[17]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[18]  Yoshua Bengio,et al.  Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy , 2019, AISec@CCS.

[19]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[21]  Saeed Mahloujifar,et al.  The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure , 2018, AAAI.