Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy

Adversarial robustness has become a central goal in deep learning, both in theory and in practice. However, successful methods to improve the adversarial robustness (such as adversarial training) greatly hurt generalization performance on the unperturbed data. This could have a major impact on how achieving adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve accuracy on the unperturbed data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases the standard test error (when there is no adversary) from 4.43% to 12.32%, whereas with our Interpolated adversarial training we retain adversarial robustness while achieving a standard test error of only 6.45%. With our technique, the relative increase in the standard error for the robust model is reduced from 178.1% to just 45.5%.

[1]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[2]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[3]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[4]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[5]  Kannan Ramchandran,et al.  Rademacher Complexity for Adversarially Robust Generalization , 2018, ICML.

[6]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[7]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[8]  Aditi Raghunathan,et al.  Adversarial Training Can Hurt Generalization , 2019, ArXiv.

[9]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[10]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[11]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[12]  Chao Zhang,et al.  Generalization Bounds for Vicinal Risk Minimization Principle , 2018, ArXiv.

[13]  Di He,et al.  Adversarially Robust Generalization Just Requires More Unlabeled Data , 2019, ArXiv.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[16]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[19]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[22]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[23]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[24]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[25]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[26]  Quoc V. Le,et al.  Intriguing Properties of Adversarial Examples , 2017, ICLR.

[27]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[28]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[29]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[30]  Lujo Bauer,et al.  Adversarial Generative Nets: Neural Network Attacks on State-of-the-Art Face Recognition , 2018, ArXiv.

[31]  Yoshua Bengio,et al.  Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[32]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[33]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[34]  Logan Engstrom,et al.  Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.