论文信息 - Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples

Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples

State-of-the-art neural networks are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. In this work, we establish that the use of cross-entropy loss function and the low-rank features of the training data have responsibility for the existence of these inputs. Based on this observation, we suggest that addressing adversarial examples requires rethinking the use of cross-entropy loss function and looking for an alternative that is more suited for minimization with low-rank features. In this direction, we present a training scheme called differential training, which uses a loss function defined on the differences between the features of points from opposite classes. We show that differential training can ensure a large margin between the decision boundary of the neural network and the points in the training dataset. This larger margin increases the amount of perturbation needed to flip the prediction of the classifier and makes it harder to find an adversarial example with small perturbations. We test differential training on a binary classification task with CIFAR-10 dataset and demonstrate that it radically reduces the ratio of images for which an adversarial example could be found -- not only in the training dataset, but in the test dataset as well.

[1] Masayuki Takeda,et al. Online Learning of Maximum p-Norm Margin Classifiers with Bias , 2008, COLT.

[2] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Upamanyu Madhow,et al. Sparsity-based Defense Against Adversarial Attacks on Linear Classifiers , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[5] Michael W. Mahoney,et al. Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning , 2018, J. Mach. Learn. Res..

[6] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[7] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[8] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9] Seyed-Mohsen Moosavi-Dezfooli,et al. The Robustness of Deep Networks: A Geometrical Perspective , 2017, IEEE Signal Processing Magazine.

[10] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[11] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[12] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] S. Sathiya Keerthi,et al. A fast iterative nearest point algorithm for support vector machine classifier design , 2000, IEEE Trans. Neural Networks Learn. Syst..

[16] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[18] Matthias Bethge,et al. Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[21] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[22] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[23] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[25] Aleksander Madry,et al. Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.