GAT: Generative Adversarial Training for Adversarial Example Detection and Robust Classification

The vulnerabilities of deep neural networks against adversarial examples have become a significant concern for deploying these models in sensitive domains. Devising a definitive defense against such attacks is proven to be challenging, and the methods relying on detecting adversarial samples are only valid when the attacker is oblivious to the detection mechanism. In this paper, we consider the adversarial detection problem under the robust optimization framework. We partition the input space into subspaces and train adversarial robust subspace detectors using asymmetrical adversarial training (AAT). The integration of the classifier and detectors presents a detection mechanism that provides a performance guarantee to the adversary it considered. We demonstrate that AAT promotes the learning of class-conditional distributions, which further gives rise to generative detection/classification approaches that are both robust and more interpretable. We provide comprehensive evaluations of the above methods, and demonstrate their competitive performances and compelling properties on adversarial detection and robust classification problems.

[1]  Florian Tramèr,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[2]  Aleksander Madry,et al.  Adversarial Robustness as a Prior for Learned Representations , 2019 .

[3]  Aleksander Madry,et al.  Image Synthesis with a Single (Robust) Classifier , 2019, NeurIPS.

[4]  Thomas Hofmann,et al.  The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[5]  Gregory Shakhnarovich,et al.  Natural and Adversarial Error Detection using Invariance to Image Transformations , 2019, ArXiv.

[6]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[7]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[8]  Ying Cai,et al.  Detecting Adversarial Examples Through Image Transformation , 2018, AAAI.

[9]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[10]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[11]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[12]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[13]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[14]  Ankur Srivastava,et al.  Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[15]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[16]  C. Meinel,et al.  Deep Learning for Medical Image Analysis , 2017, ArXiv.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[20]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[21]  Prateek Mittal,et al.  Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers , 2017, ArXiv.

[22]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[23]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[24]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[25]  J. H. Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[26]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[28]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[29]  Kevin Gimpel,et al.  Early Methods for Detecting Adversarial Images , 2016, ICLR.

[30]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[31]  Antonio Criminisi,et al.  Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  A. Ng,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[34]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[35]  Eugenio Culurciello,et al.  Robust Convolutional Neural Networks under Adversarial Noise , 2015, ArXiv.

[36]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[37]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[39]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[40]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[41]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[43]  Zhihao Zheng,et al.  Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks , 2018, NeurIPS.

[44]  Dawn Xiaodong Song,et al.  Decision Boundary Analysis of Adversarial Examples , 2018, ICLR.

[45]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[46]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .