Improving Robustness to Adversarial Examples by Encouraging Discriminative Features

Deep neural networks (DNNs) have achieved state-of-the-art results in various pattern recognition tasks. However, they perform poorly on out-of-distribution adversarial examples i.e. inputs that are specifically crafted by an adversary to cause DNNs to misbehave, questioning the security and reliability of applications. In this paper, we hypothesize inter-class and intra-class feature variances to be one of the reasons behind the existence of adversarial examples. Additionally, learning low intra-class and high inter-class feature variance help classifiers learn decision boundaries that are more compact and leave less inter-class low-probability "pockets" in the feature space, i.e. less room for adversarial perturbations. We achieve this by imposing a center loss [1] in addition to the regular softmax cross-entropy loss while training a DNN classifier. Intuitively, the center loss encourages DNNs to simultaneously learn a center for the deep features of each class, and minimize the distances between the intra-class deep features and their corresponding class centers. Our results on state-of-the-art architectures tested on MNIST, CIFAR-10, and CIFAR-100 datasets confirm our hypothesis and highlight the importance of discriminative features in the existence of adversarial examples.

[1]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Moustapha Cissé,et al.  Houdini: Fooling Deep Structured Prediction Models , 2017, ArXiv.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[7]  Zhitao Gong,et al.  Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Mykel J. Kochenderfer,et al.  Towards Proving the Adversarial Robustness of Deep Neural Networks , 2017, FVAV@iFM.

[10]  Anh Nguyen,et al.  VectorDefense: Vectorization as a Defense to Adversarial Examples , 2018, Studies in Computational Intelligence.

[11]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[12]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[13]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[14]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[15]  Nina Narodytska,et al.  Simple Black-Box Adversarial Perturbations for Deep Networks , 2016, ArXiv.

[16]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Terrance E. Boult,et al.  Towards Robust Deep Neural Networks with BANG , 2016, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.