Robust Deep Learning Ensemble Against Deception

Deep neural network (DNN) models are known to be vulnerable to maliciously crafted adversarial examples and to out-of-distribution inputs drawn sufficiently far away from the training data. How to protect a machine learning model against deception of both types of destructive inputs remains an open challenge. This paper presents XEnsemble, a diversity ensemble verification methodology for enhancing the adversarial robustness of DNN models against deception caused by either adversarial examples or out-of-distribution inputs. XEnsemble by design has three unique capabilities. First, XEnsemble builds diverse input denoising verifiers by leveraging different data cleaning techniques. Second, XEnsemble develops a disagreement-diversity ensemble learning methodology for guarding the output of the prediction model against deception. Third, XEnsemble provides a suite of algorithms to combine input verification and output verification to protect the DNN prediction models from both adversarial examples and out of distribution inputs. Evaluated using ten popular adversarial attacks and two representative out-of-distribution datasets, we show that XEnsemble achieves a high defense success rate against adversarial examples and a high detection success rate against out-of-distribution data inputs, and outperforms existing representative defense methods with respect to robustness and defensibility.

[1]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[2]  Yanzhao Wu,et al.  Cross-Layer Strategic Ensemble Defense Against Adversarial Examples , 2019, 2020 International Conference on Computing, Networking and Communications (ICNC).

[3]  Xia Zhu,et al.  Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-out Classifiers , 2018, ECCV.

[4]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[5]  Tao Xie,et al.  MULDEF: Multi-model-based Defense Against Adversarial Examples for Neural Networks , 2018, ArXiv.

[6]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[7]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[8]  Michael P. Wellman,et al.  SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[9]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[10]  Daniel Cullina,et al.  Enhancing robustness of machine learning systems via data transformations , 2017, 2018 52nd Annual Conference on Information Sciences and Systems (CISS).

[11]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[12]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[13]  Araceli Sanchis,et al.  Generating ensembles of heterogeneous classifiers using Stacked Generalization , 2015, WIREs Data Mining Knowl. Discov..

[14]  Xiaoyu Cao,et al.  Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification , 2017, ACSAC.

[15]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[16]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[17]  Joseph Keshet,et al.  Out-of-Distribution Detection using Multiple Semantic Label Representations , 2018, NeurIPS.

[18]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[19]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  Ling Liu,et al.  Adversarial Examples in Deep Learning: Characterization and Divergence , 2018, ArXiv.

[22]  Gary Geunbae Lee,et al.  Out-of-domain Detection based on Generative Adversarial Network , 2018, EMNLP.

[23]  Qiang Xu,et al.  Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks , 2018, AAAI.

[24]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[25]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[26]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[27]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[28]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[29]  Tania B. Huedo-Medina,et al.  Assessing heterogeneity in meta-analysis: Q statistic or I2 index? , 2006, Psychological methods.

[30]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[31]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[32]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[33]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[34]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[35]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[37]  Yanzhao Wu,et al.  Deep Neural Network Ensembles Against Deception: Ensemble Diversity, Accuracy and Robustness , 2019, 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[38]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[39]  Wen-Chuan Lee,et al.  NIC: Detecting Adversarial Samples with Neural Network Invariant Checking , 2019, NDSS.

[40]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[41]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[42]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[43]  Wenqi Wei,et al.  Boosting Ensemble Accuracy by Revisiting Ensemble Diversity Metrics , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yanzhao Wu,et al.  A Framework for Evaluating Gradient Leakage Attacks in Federated Learning , 2020, ArXiv.

[45]  Thomas J. Sargent,et al.  SciPy , 2020, Learning Scientific Programming with Python.

[46]  Ian J. Goodfellow Defense Against the Dark Arts: An overview of adversarial example security research and future research directions , 2018, ArXiv.

[47]  Michael Cogswell,et al.  Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles , 2016, NIPS.

[48]  Wenqi Wei,et al.  Demystifying Membership Inference Attacks in Machine Learning as a Service , 2019, IEEE Transactions on Services Computing.

[49]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[50]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[51]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.