Improve Adversarial Robustness via Weight Penalization on Classification Layer

It is well-known that deep neural networks are vulnerable to adversarial attacks. Recent studies show that well-designed classification parts can lead to better robustness. However, there is still much space for improvement along this line. In this paper, we first prove that, from a geometric point of view, the robustness of a neural network is equivalent to some angular margin condition of the classifier weights. We then explain why ReLU type function is not a good choice for activation under this framework. These findings reveal the limitations of the existing approaches and lead us to develop a novel light-weight-penalized defensive method, which is simple and has a good scalability. Empirical results on multiple benchmark datasets demonstrate that our method can effectively improve the robustness of the network without requiring too much additional computation, while maintaining a high classification precision for clean data.

[1]  Jinfeng Yi,et al.  Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[2]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[3]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[4]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bernard Ghanem,et al.  Detecting small faces in the wild based on generative adversarial network and contextual information , 2019, Pattern Recognit..

[7]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[8]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[9]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Aleksander Madry,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[11]  Qingjie Zhao,et al.  Detecting adversarial examples via prediction difference for deep neural networks , 2019, Inf. Sci..

[12]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[13]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[15]  Patrick Kidger,et al.  Universal Approximation with Deep Narrow Networks , 2019, COLT 2019.

[16]  Bin Dong,et al.  You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[17]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[18]  Saibal Mukhopadhyay,et al.  Cascade Adversarial Machine Learning Regularized with a Unified Embedding , 2017, ICLR.

[19]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[20]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[22]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[23]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[26]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Ling Shao,et al.  Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[29]  Uri Shaham,et al.  Understanding adversarial training: Increasing local stability of supervised models through robust optimization , 2015, Neurocomputing.

[30]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[31]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[32]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[33]  Li Wang,et al.  Face hallucination from low quality images using definition-scalable inference , 2019, Pattern Recognit..

[34]  Ning Chen,et al.  Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[35]  Roi Naveiro,et al.  Adversarial classification: An adversarial risk analysis approach , 2018, Int. J. Approx. Reason..

[36]  Jun Zhu,et al.  Max-Mahalanobis Linear Discriminant Analysis Networks , 2018, ICML.

[37]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[38]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[39]  Jiansheng Chen,et al.  Rethinking Feature Distribution for Loss Functions in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[41]  Chung-Hsien Wu,et al.  Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses , 2019, Pattern Recognit..

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .