论文信息 - Improving Robustness to Adversarial Examples by Encouraging Discriminative Features

Improving Robustness to Adversarial Examples by Encouraging Discriminative Features

Deep neural networks (DNNs) have achieved state-of-the-art results in various pattern recognition tasks. However, they perform poorly on out-of-distribution adversarial examples i.e. inputs that are specifically crafted by an adversary to cause DNNs to misbehave, questioning the security and reliability of applications. In this paper, we hypothesize inter-class and intra-class feature variances to be one of the reasons behind the existence of adversarial examples. Additionally, learning low intra-class and high inter-class feature variance help classifiers learn decision boundaries that are more compact and leave less inter-class low-probability "pockets" in the feature space, i.e. less room for adversarial perturbations. We achieve this by imposing a center loss [1] in addition to the regular softmax cross-entropy loss while training a DNN classifier. Intuitively, the center loss encourages DNNs to simultaneously learn a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers. Our results on state-of-the-art architectures tested on MNIST, CIFAR-10, and CIFAR-100 datasets confirm our hypothesis and highlight the importance of discriminative features in the existence of adversarial examples.

Chirag Agarwal | Dan Schonfeld | Anh Nguyen

[1] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Moustapha Cissé,et al. Houdini: Fooling Deep Structured Prediction Models , 2017, ArXiv.

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] David J. Fleet,et al. Adversarial Manipulation of Deep Representations , 2015, ICLR.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[7] Zhitao Gong,et al. Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Mykel J. Kochenderfer,et al. Towards Proving the Adversarial Robustness of Deep Neural Networks , 2017, FVAV@iFM.

[10] Anh Nguyen,et al. VectorDefense: Vectorization as a Defense to Adversarial Examples , 2018, Studies in Computational Intelligence.

[11] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[12] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[13] Ajmal Mian,et al. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[14] Kouichi Sakurai,et al. One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[15] Nina Narodytska,et al. Simple Black-Box Adversarial Perturbations for Deep Networks , 2016, ArXiv.

[16] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[17] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20] Terrance E. Boult,et al. Towards Robust Deep Neural Networks with BANG , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.