Learn Robust Features via Orthogonal Multi-Path

It is now widely known that by adversarial attacks, clean images with invisible perturbations can fool deep neural networks. To defend adversarial attacks, we design a block containing multiple paths to learn robust features and the parameters of these paths are required to be orthogonal with each other. The so-called Orthogonal Multi-Path (OMP) block could be posed in any layer of a neural network. Via forward learning and backward correction, one OMP block makes the neural networks learn features that are appropriate for all the paths and hence are expected to be robust. With careful design and thorough experiments on e.g., the positions of imposing orthogonality constraint, and the trade-off between the variety and accuracy, the robustness of the neural networks is significantly improved. For example, under white-box PGD attack with $l_\infty$ bound ${8}/{255}$ (this is a fierce attack that can make the accuracy of many vanilla neural networks drop to nearly $10\%$ on CIFAR10), VGG16 with the proposed OMP block could keep over $50\%$ accuracy. For black-box attacks, neural networks equipped with an OMP block have accuracy over $80\%$. The performance under both white-box and black-box attacks is much better than the existing state-of-the-art adversarial defenders.

[1]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[3]  Jie Yang,et al.  Adversarial Attack Type I: Cheat Classifiers by Significant Changes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[6]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[7]  Aleksander Madry,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[8]  Michelle Karg,et al.  Learn2Perturb: An End-to-End Feature Perturbation Learning to Improve Adversarial Robustness , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[10]  Deliang Fan,et al.  Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness Against Adversarial Attack , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[13]  Yuanzhi Li,et al.  Backward Feature Correction: How Deep Learning Performs Deep Learning , 2020, ArXiv.

[14]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[15]  Cho-Jui Hsieh,et al.  Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[16]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[17]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[18]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[19]  Cho-Jui Hsieh,et al.  Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network , 2018, ICLR.

[20]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[21]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[22]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[25]  Mohammed Bennamoun,et al.  Orthogonal Deep Models as Defense Against Black-Box Attacks , 2020, IEEE Access.

[26]  David A. Forsyth,et al.  SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[29]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[30]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).