Adversarial Learning Targeting Deep Neural Network Classification: A Comprehensive Review of Defenses Against Attacks

With wide deployment of machine learning (ML)-based systems for a variety of applications including medical, military, automotive, genomic, multimedia, and social networking, there is great potential for damage from adversarial learning (AL) attacks. In this article, we provide a contemporary survey of AL, focused particularly on defenses against attacks on deep neural network classifiers. After introducing relevant terminology and the goals and range of possible knowledge of both attackers and defenders, we survey recent work on test-time evasion (TTE), data poisoning (DP), backdoor DP, and reverse engineering (RE) attacks and particularly defenses against the same. In so doing, we distinguish robust classification from anomaly detection (AD), unsupervised from supervised, and statistical hypothesis-based defenses from ones that do not have an explicit null (no attack) hypothesis. We also consider several scenarios for detecting backdoors. We provide a technical assessment for reviewed works, including identifying any issues/limitations, required hyperparameters, needed computational complexity, as well as the performance measures evaluated and the obtained quality. We then delve deeper, providing novel insights that challenge conventional AL wisdom and that target unresolved issues, including: robust classification versus AD as a defense strategy; the belief that attack success increases with attack strength, which ignores susceptibility to AD; small perturbations for TTE attacks: a fallacy or a requirement; validity of the universal assumption that a TTE attacker knows the ground-truth class for the example to be attacked; black, gray, or white-box attacks as the standard for defense evaluation; and susceptibility of query-based RE to an AD defense. We also discuss attacks on the privacy of training data. We then present benchmark comparisons of several defenses against TTE, RE, and backdoor DP attacks on images. The article concludes with a discussion of continuing research directions, including the supreme challenge of detecting attacks whose goal is not to alter classification decisions, but rather simply to embed, without detection, “fake news” or other false content.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[3]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[5]  David J. Miller,et al.  Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection , 2006, IEEE Transactions on Signal Processing.

[6]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[7]  Kevin Gimpel,et al.  Early Methods for Detecting Adversarial Images , 2016, ICLR.

[8]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[9]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[10]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[11]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[12]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[13]  Valentina Zantedeschi,et al.  Efficient Defenses Against Adversarial Attacks , 2017, AISec@CCS.

[14]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[15]  George Kesidis,et al.  Adversarial learning: A critical review and active learning study , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[16]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[17]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[18]  Fabio Roli,et al.  Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection , 2017, IEEE Transactions on Dependable and Secure Computing.

[19]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[22]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[23]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[24]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[25]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[26]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[27]  David J. Miller,et al.  ATD: Anomalous Topic Discovery in High Dimensional Discrete Data , 2015, IEEE Transactions on Knowledge and Data Engineering.

[28]  Terrance E. Boult,et al.  Assessing Threat of Adversarial Examples on Deep Neural Networks , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[29]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[30]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[31]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[32]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[33]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[34]  Percy Liang,et al.  Stronger data poisoning attacks break data sanitization defenses , 2018, Machine Learning.

[35]  David A. Forsyth,et al.  SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[37]  George Kesidis,et al.  ANOMALY DETECTION OF ATTACKS (ADA) ON DNN CLASSIFIERS AT TEST TIME , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[38]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[39]  Aleksander Madry,et al.  Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[40]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[41]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[43]  George Kesidis,et al.  Unsupervised Parsimonious Cluster-Based Anomaly Detection (PCAD) , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[44]  David J. Miller,et al.  Parsimonious Topic Models with Salient Word Discovery , 2014, IEEE Transactions on Knowledge and Data Engineering.

[45]  Ling Huang,et al.  Adversarial Active Learning , 2014, AISec '14.

[46]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[47]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[49]  Siddharth Garg,et al.  BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.

[50]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[51]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[52]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[53]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[54]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[55]  Zhen Xiang,et al.  Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic , 2019, ArXiv.

[56]  Keith W. Ross,et al.  Computer networking - a top-down approach featuring the internet , 2000 .

[57]  Bob L. Sturm,et al.  Deep Learning and Music Adversaries , 2015, IEEE Transactions on Multimedia.

[58]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[60]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[61]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[62]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[65]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[66]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[67]  Blaine Nelson,et al.  Misleading Learners: Co-opting Your Spam Filter , 2009 .

[68]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[69]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[70]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[71]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[72]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[73]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[74]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[75]  George Kesidis,et al.  When Not to Classify: Detection of Reverse Engineering Attacks on DNN Image Classifiers , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[76]  Xiangliang Zhang,et al.  Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering , 2014, CIKM.

[77]  Zhen Xiang,et al.  A Benchmark Study Of Backdoor Data Poisoning Defenses For Deep Neural Network Classifiers And A Novel Defense , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[78]  Inderjit S. Dhillon,et al.  Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[79]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[80]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[81]  Bob L. Sturm,et al.  Deep learning, audio adversaries, and music content analysis , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[82]  Manfred Morari,et al.  Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks , 2019, NeurIPS.

[83]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[84]  David J. Miller,et al.  Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[85]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[86]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[87]  Jenq-Neng Hwang,et al.  Solving inverse problems by Bayesian neural network iterative inversion with ground truth incorporation , 1997, IEEE Trans. Signal Process..

[88]  Ryan P. Adams,et al.  Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[89]  Inderjit S. Dhillon,et al.  The Limitations of Adversarial Training and the Blind-Spot Attack , 2019, ICLR.

[90]  Christopher Meek,et al.  Good Word Attacks on Statistical Spam Filters , 2005, CEAS.

[91]  Fabio Roli,et al.  Is data clustering in adversarial settings secure? , 2013, AISec.

[92]  Dandelion Mané,et al.  DEFENSIVE QUANTIZATION: WHEN EFFICIENCY MEETS ROBUSTNESS , 2018 .

[93]  Sencun Zhu,et al.  Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[94]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[96]  Wenbo Guo,et al.  Adversary Resistant Deep Neural Networks with an Application to Malware Detection , 2016, KDD.