Interpolated Joint Space Adversarial Training for Robust and Generalizable Defenses

Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks. However, models trained with AT sacrifice standard accuracy and do not generalize well to novel attacks. Recent works show generalization improvement with adversarial samples under novel threat models such as on-manifold threat model or neural perceptual threat model. However, the former requires exact manifold information while the latter requires algorithm relaxation. Motivated by these considerations, we exploit the underlying manifold information with Normalizing Flow, ensuring that exact manifold assumption holds. Moreover, we propose a novel threat model called Joint Space Threat Model (JSTM), which can serve as a special case of the neural perceptual threat model that does not require additional relaxation to craft the corresponding adversarial attacks. Under JSTM, we develop novel adversarial attacks and defenses. The mixup strategy improves the standard accuracy of neural networks but sacrifices robustness when combined with AT. To tackle this issue, we propose the Robust Mixup strategy in which we maximize the adversity of the interpolated images and gain robustness and prevent overfitting. Our experiments show that Interpolated Joint Space Adversarial Training (IJSAT) achieves good performance in standard accuracy, robustness, and generalization in CIFAR-10/100, OM-ImageNet, and CIFAR-10-C datasets. IJSAT is also flexible and can be used as a data augmentation method to improve standard accuracy and combine with many existing AT approaches to improve robustness.

[1]  Bernt Schiele,et al.  Disentangling Adversarial Robustness and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[3]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[4]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[5]  Sahil Singla,et al.  Perceptual Adversarial Robustness: Defense Against Unseen Threat Models , 2020, ICLR.

[6]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Kun Xu,et al.  Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks , 2020, ICLR.

[9]  Yoshua Bengio,et al.  Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy , 2019, AISec@CCS.

[10]  T. Goldstein,et al.  Certified Defenses for Adversarial Patches , 2020, ICLR.

[11]  Wei-An Lin,et al.  Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks , 2020, NeurIPS.

[12]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[13]  Michael J. Black,et al.  Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders , 2018, AAAI.

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[16]  John Duchi,et al.  Understanding and Mitigating the Tradeoff Between Robustness and Accuracy , 2020, ICML.

[17]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[18]  Bernt Schiele,et al.  Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks , 2019, ICML.

[19]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[20]  Yi Sun,et al.  Testing Robustness Against Unforeseen Adversaries , 2019, ArXiv.

[21]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[22]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[23]  Soheil Feizi,et al.  Functional Adversarial Attacks , 2019, NeurIPS.

[24]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[25]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[26]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[27]  Nicolas Flammarion,et al.  On the effectiveness of adversarial training against common corruptions , 2021, UAI.

[28]  Zhanxing Zhu,et al.  On Breaking Deep Generative Model-based Defenses and Beyond , 2020, ICML.

[29]  J. Zico Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[30]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[31]  Thomas Hofmann,et al.  The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[32]  Quoc V. Le,et al.  Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Peilin Zhong,et al.  Resisting Adversarial Attacks by k-Winners-Take-All , 2019, ArXiv.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[36]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[37]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[38]  Sungroh Yoon,et al.  Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[40]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[41]  Soheil Feizi,et al.  Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks , 2019, AISTATS.

[42]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[44]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[45]  Tatjana Chavdarova,et al.  Semantic Perturbations with Normalizing Flows for Improved Generalization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Ser-Nam Lim,et al.  Fine-grained Synthesis of Unrestricted Adversarial Examples , 2019, ArXiv.

[47]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[48]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[49]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[50]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[51]  J. Zico Kolter,et al.  Adversarial Robustness Against the Union of Multiple Perturbation Models , 2019, ICML.

[52]  Yang Song,et al.  Constructing Unrestricted Adversarial Examples with Generative Models , 2018, NeurIPS.

[53]  Francesco Renna,et al.  On instabilities of deep learning in image reconstruction and the potential costs of AI , 2019, Proceedings of the National Academy of Sciences.

[54]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[55]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[56]  Alexander Levine,et al.  Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation , 2019, AAAI.

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.