Adversarial Attack Type I: Cheat Classifiers by Significant Changes

Despite the great success of deep neural networks, the adversarial attack can cheat some well-trained classifiers by small permutations. In this paper, we propose another type of adversarial attack that can cheat classifiers by significant changes. For example, we can significantly change a face but well-trained neural networks still recognize the adversarial and the original example as the same person. Statistically, the existing adversarial attack increases Type II error and the proposed one aims at Type I error, which are hence named as Type II and Type I adversarial attack, respectively. The two types of attack are equally important but are essentially different, which are intuitively explained and numerically evaluated. To implement the proposed attack, a supervised variation autoencoder is designed and then the classifier is attacked by updating the latent variables using gradient information. Experimental results show that our method is practical and effective to generate Type I adversarial examples on large-scale image datasets. Most of these generated examples can pass detectors designed for defending Type II attack and the strengthening strategy is only efficient with a specific type attack, both implying that the underlying reasons for Type I and Type II attack are different.

[1]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[2]  Robert Pless,et al.  Deep Feature Interpolation for Image Content Changes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[4]  Ian S. Fischer,et al.  Learning to Attack: Adversarial Transformation Networks , 2018, AAAI.

[5]  Yang Song,et al.  Constructing Unrestricted Adversarial Examples with Generative Models , 2018, NeurIPS.

[6]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[7]  Mykel J. Kochenderfer,et al.  Towards Proving the Adversarial Robustness of Deep Neural Networks , 2017, FVAV@iFM.

[8]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[9]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[10]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[12]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Jungwoo Lee,et al.  Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN , 2017, ArXiv.

[15]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[20]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[21]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[22]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[26]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[27]  Martin Wattenberg,et al.  Adversarial Spheres , 2018, ICLR.

[28]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[29]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[30]  Beilun Wang,et al.  A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples , 2016, ICLR 2017.

[31]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[35]  Fan Yang,et al.  Good Semi-supervised Learning That Requires a Bad GAN , 2017, NIPS.

[36]  Tieniu Tan,et al.  IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis , 2018, NeurIPS.

[37]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ian S. Fischer,et al.  Adversarial Transformation Networks: Learning to Generate Adversarial Examples , 2017, ArXiv.

[40]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[41]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[42]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).