Adversarial Learning in Statistical Classification: A Comprehensive Review of Defenses Against Attacks

There is great potential for damage from adversarial learning (AL) attacks on machine-learning based systems. In this paper, we provide a contemporary survey of AL, focused particularly on defenses against attacks on statistical classifiers. After introducing relevant terminology and the goals and range of possible knowledge of both attackers and defenders, we survey recent work on test-time evasion (TTE), data poisoning (DP), and reverse engineering (RE) attacks and particularly defenses against same. In so doing, we distinguish robust classification from anomaly detection (AD), unsupervised from supervised, and statistical hypothesis-based defenses from ones that do not have an explicit null (no attack) hypothesis; we identify the hyperparameters a particular method requires, its computational complexity, as well as the performance measures on which it was evaluated and the obtained quality. We then dig deeper, providing novel insights that challenge conventional AL wisdom and that target unresolved issues, including: 1) robust classification versus AD as a defense strategy; 2) the belief that attack success increases with attack strength, which ignores susceptibility to AD; 3) small perturbations for test-time evasion attacks: a fallacy or a requirement?; 4) validity of the universal assumption that a TTE attacker knows the ground-truth class for the example to be attacked; 5) black, grey, or white box attacks as the standard for defense evaluation; 6) susceptibility of query-based RE to an AD defense. We also discuss attacks on the privacy of training data. We then present benchmark comparisons of several defenses against TTE, RE, and backdoor DP attacks on images. The paper concludes with a discussion of future work.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[3]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[4]  Fabio Roli,et al.  Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection , 2017, IEEE Transactions on Dependable and Secure Computing.

[5]  David J. Miller,et al.  Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[8]  Valentina Zantedeschi,et al.  Efficient Defenses Against Adversarial Attacks , 2017, AISec@CCS.

[9]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[10]  Bob L. Sturm,et al.  Deep learning, audio adversaries, and music content analysis , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Manfred Morari,et al.  Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks , 2019, NeurIPS.

[13]  Zhen Xiang,et al.  A Mixture Model Based Defense for Data Poisoning Attacks Against Naive Bayes Spam Filters , 2018, ArXiv.

[14]  Sencun Zhu,et al.  Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[15]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[16]  George Kesidis,et al.  Adversarial learning: A critical review and active learning study , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[17]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[18]  Percy Liang,et al.  Stronger data poisoning attacks break data sanitization defenses , 2018, Machine Learning.

[19]  David A. Forsyth,et al.  SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  S. Katagiri,et al.  Discriminative Learning for Minimum Error Classification , 2009 .

[21]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[22]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[23]  George Kesidis,et al.  ANOMALY DETECTION OF ATTACKS (ADA) ON DNN CLASSIFIERS AT TEST TIME , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[24]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[25]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[27]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[28]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, IEEE Transactions on Knowledge and Data Engineering.

[29]  David J. Miller,et al.  Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection , 2006, IEEE Transactions on Signal Processing.

[30]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[31]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[32]  Kevin Gimpel,et al.  Early Methods for Detecting Adversarial Images , 2016, ICLR.

[33]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[34]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[35]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[36]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[37]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[38]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[39]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[40]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[41]  George Kesidis,et al.  When Not to Classify: Detection of Reverse Engineering Attacks on DNN Image Classifiers , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[43]  Zhen Xiang,et al.  Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic , 2019, ArXiv.

[44]  David J. Miller,et al.  ATD: Anomalous Topic Discovery in High Dimensional Discrete Data , 2015, IEEE Transactions on Knowledge and Data Engineering.

[45]  Jenq-Neng Hwang,et al.  Solving inverse problems by Bayesian neural network iterative inversion with ground truth incorporation , 1997, IEEE Trans. Signal Process..

[46]  Ryan P. Adams,et al.  Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[47]  Inderjit S. Dhillon,et al.  The Limitations of Adversarial Training and the Blind-Spot Attack , 2019, ICLR.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[50]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[51]  Siddharth Garg,et al.  BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.

[52]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[53]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[54]  George Kesidis,et al.  Unsupervised Parsimonious Cluster-Based Anomaly Detection (PCAD) , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).

[55]  David J. Miller,et al.  Parsimonious Topic Models with Salient Word Discovery , 2014, IEEE Transactions on Knowledge and Data Engineering.

[56]  Ling Huang,et al.  Adversarial Active Learning , 2014, AISec '14.

[57]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[58]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[59]  Song Han,et al.  Defensive Quantization: When Efficiency Meets Robustness , 2019, ICLR.

[60]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Christopher Meek,et al.  Good Word Attacks on Statistical Spam Filters , 2005, CEAS.

[62]  Fabio Roli,et al.  Is data clustering in adversarial settings secure? , 2013, AISec.

[63]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[64]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[65]  Devavrat Shah,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[66]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[67]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[68]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[69]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[70]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  George Kesidis,et al.  Multicategory Crowdsourcing Accounting for Variable Task Difficulty, Worker Skill, and Worker Intention , 2015, IEEE Transactions on Knowledge and Data Engineering.

[72]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[73]  Jim Kurose,et al.  Computer Networking: A Top-Down Approach , 1999 .

[74]  Blaine Nelson,et al.  Misleading Learners: Co-opting Your Spam Filter , 2009 .

[75]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[76]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[77]  Bob L. Sturm,et al.  Deep Learning and Music Adversaries , 2015, IEEE Transactions on Multimedia.

[78]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[79]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[80]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[81]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[82]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[83]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[84]  Aleksander Madry,et al.  Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[85]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[87]  Wenbo Guo,et al.  Adversary Resistant Deep Neural Networks with an Application to Malware Detection , 2016, KDD.

[88]  Xiangliang Zhang,et al.  Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering , 2014, CIKM.

[89]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[90]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[91]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[92]  Keith W. Ross,et al.  Computer networking - a top-down approach featuring the internet , 2000 .

[93]  Zhen Xiang,et al.  A Benchmark Study Of Backdoor Data Poisoning Defenses For Deep Neural Network Classifiers And A Novel Defense , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[94]  Inderjit S. Dhillon,et al.  Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[95]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[96]  Terrance E. Boult,et al.  Assessing Threat of Adversarial Examples on Deep Neural Networks , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[97]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[98]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[99]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.