PARL: Enhancing Diversity of Ensemble Networks to Resist Adversarial Attacks via Pairwise Adversarially Robust Loss Function

The security of Deep Learning classifiers is a critical field of study because of the existence of adversarial attacks. Such attacks usually rely on the principle of transferability, where an adversarial example crafted on a surrogate classifier tends to mislead the target classifier trained on the same dataset even if both classifiers have quite different architecture. Ensemble methods against adversarial attacks demonstrate that an adversarial example is less likely to mislead multiple classifiers in an ensemble having diverse decision boundaries. However, recent ensemble methods have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation. This paper attempts to develop a new ensemble methodology that constructs multiple diverse classifiers using a Pairwise Adversarially Robust Loss (PARL) function during the training procedure. PARL utilizes gradients of each layer with respect to input in every classifier within the ensemble simultaneously. The proposed training procedure enables PARL to achieve higher robustness against black-box transfer attacks compared to previous ensemble methods without adversely affecting the accuracy of clean examples. We also evaluate the robustness in the presence of white-box attacks, where adversarial examples are crafted using parameters of the target classifier. We present extensive experiments using standard image classification datasets like CIFAR-10 and CIFAR100 trained using standard ResNet20 classifier against state-of-the-art adversarial attacks to demonstrate the robustness of the proposed ensemble methodology. Preprint. Under review. ar X iv :2 11 2. 04 94 8v 1 [ cs .L G ] 9 D ec 2 02 1

[1]  Prateek Mittal,et al.  Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers , 2017, ArXiv.

[2]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[3]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[4]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[5]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[6]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hai Li,et al.  DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles , 2020, NeurIPS.

[8]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[10]  Ning Chen,et al.  Improving Adversarial Robustness via Promoting Ensemble Diversity , 2019, ICML.

[11]  Zhitao Gong,et al.  Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[14]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[15]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[16]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[17]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Geoffrey E. Hinton,et al.  Similarity of Neural Network Representations Revisited , 2019, ICML.

[20]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[21]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[23]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Kevin Gimpel,et al.  Early Methods for Detecting Adversarial Images , 2016, ICLR.

[26]  Uri Shaham,et al.  Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.

[27]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Moinuddin K. Qureshi,et al.  Improving Adversarial Robustness of Ensembles with Diversity Training , 2019, ArXiv.

[29]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[30]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[31]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[32]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[33]  Petr Smirnov,et al.  Stochastic Combinatorial Ensembles for Defending Against Adversarial Examples , 2018, ArXiv.

[34]  Antonio Criminisi,et al.  Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[35]  Holger Ulmer,et al.  Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2017, ArXiv.

[36]  Terrance E. Boult,et al.  Adversarial Diversity and Hard Positive Generation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Dale Schuurmans,et al.  Learning with a Strong Adversary , 2015, ArXiv.

[38]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[39]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[40]  Eugenio Culurciello,et al.  Robust Convolutional Neural Networks under Adversarial Noise , 2015, ArXiv.

[41]  Pushmeet Kohli,et al.  Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles , 2018, ArXiv.