论文信息 - Improve Adversarial Robustness via Weight Penalization on Classification Layer

Improve Adversarial Robustness via Weight Penalization on Classification Layer

It is well-known that deep neural networks are vulnerable to adversarial attacks. Recent studies show that well-designed classification parts can lead to better robustness. However, there is still much space for improvement along this line. In this paper, we first prove that, from a geometric point of view, the robustness of a neural network is equivalent to some angular margin condition of the classifier weights. We then explain why ReLU type function is not a good choice for activation under this framework. These findings reveal the limitations of the existing approaches and lead us to develop a novel light-weight-penalized defensive method, which is simple and has a good scalability. Empirical results on multiple benchmark datasets demonstrate that our method can effectively improve the robustness of the network without requiring too much additional computation, while maintaining a high classification precision for clean data.

Cong Xu | Dan Li | Min Yang

[1] Jinfeng Yi,et al. Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[2] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[3] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[4] Larry S. Davis,et al. Adversarial Training for Free! , 2019, NeurIPS.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Bernard Ghanem,et al. Detecting small faces in the wild based on generative adversarial network and contextual information , 2019, Pattern Recognit..

[7] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[8] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[9] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Aleksander Madry,et al. On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[11] Qingjie Zhao,et al. Detecting adversarial examples via prediction difference for deep neural networks , 2019, Inf. Sci..

[12] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[13] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[15] Patrick Kidger,et al. Universal Approximation with Deep Narrow Networks , 2019, COLT 2019.

[16] Bin Dong,et al. You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[17] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[18] Saibal Mukhopadhyay,et al. Cascade Adversarial Machine Learning Regularized with a Unified Embedding , 2017, ICLR.

[19] James Bailey,et al. Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[20] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Aleksander Madry,et al. On Evaluating Adversarial Robustness , 2019, ArXiv.

[22] Yanjun Qi,et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[23] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[26] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27] Ling Shao,et al. Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[29] Uri Shaham,et al. Understanding adversarial training: Increasing local stability of supervised models through robust optimization , 2015, Neurocomputing.

[30] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[31] Matthias Bethge,et al. Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[32] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[33] Li Wang,et al. Face hallucination from low quality images using definition-scalable inference , 2019, Pattern Recognit..

[34] Ning Chen,et al. Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[35] Roi Naveiro,et al. Adversarial classification: An adversarial risk analysis approach , 2018, Int. J. Approx. Reason..

[36] Jun Zhu,et al. Max-Mahalanobis Linear Discriminant Analysis Networks , 2018, ICML.

[37] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[38] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[39] Jiansheng Chen,et al. Rethinking Feature Distribution for Loss Functions in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[41] Chung-Hsien Wu,et al. Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses , 2019, Pattern Recognit..

[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .