Training Robust Deep Neural Networks via Adversarial Noise Propagation

Deep neural networks have been found vulnerable to noises like adversarial examples and corruption in practice. A number of adversarial defense methods have been developed, which indeed improve the model robustness towards adversarial examples in practice. However, only relying on training with the data mixed with noises, most of them still fail to defend the generalized types of noises. Motivated by the fact that hidden layers play a very important role in maintaining a robust model, this paper comes up with a simple yet powerful training algorithm named Adversarial Noise Propagation (ANP) that injects diversified noises into the hidden layers in a layer-wise manner. We show that ANP can be efficiently implemented by exploiting the nature of the popular backward-forward training style for deep models. To comprehensively understand the behaviors and contributions of hidden layers, we further explore the insights from hidden representation insensitivity and human vision perception alignment. Extensive experiments on MNIST, CIFAR-10, CIFAR-10-C, CIFAR-10-P and ImageNet demonstrate that ANP enables the strong robustness for deep models against the generalized noises including both adversarial and corrupted ones, and significantly outperforms various adversarial defense methods.

[1]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[2]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[3]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[4]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[5]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[6]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[8]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[9]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[10]  Aleksander Madry,et al.  Image Synthesis with a Single (Robust) Classifier , 2019, NeurIPS.

[11]  Samy Bengio,et al.  Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..

[12]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[15]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[16]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[17]  Dawn Xiaodong Song,et al.  Decision Boundary Analysis of Adversarial Examples , 2018, ICLR.

[18]  Lina J. Karam,et al.  A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[19]  Dacheng Tao,et al.  Perceptual-Sensitive GAN for Generating Adversarial Patches , 2019, AAAI.

[20]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[21]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[22]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[25]  Changshui Zhang,et al.  Deep Defense: Training DNNs with Improved Adversarial Robustness , 2018, NeurIPS.

[26]  Swami Sankaranarayanan,et al.  Regularizing deep networks using efficient layerwise adversarial training , 2017, AAAI.

[27]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[28]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[29]  Hossein Mobahi,et al.  Large Margin Deep Networks for Classification , 2018, NeurIPS.

[30]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[31]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[32]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[33]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[34]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[36]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[38]  Aleksander Madry,et al.  Computer Vision with a Single (Robust) Classifier , 2019, NeurIPS 2019.