When Not to Classify: Anomaly Detection of Attacks (ADA) on DNN Classifiers at Test Time

A significant threat to the recent, wide deployment of machine learning–based systems, including deep neural networks (DNNs), is adversarial learning attacks. The main focus here is on evasion attacks against DNN-based classifiers at test time. While much work has focused on devising attacks that make small perturbations to a test pattern (e.g., an image) that induce a change in the classifier's decision, until recently there has been a relative paucity of work defending against such attacks. Some works robustify the classifier to make correct decisions on perturbed patterns. This is an important objective for some applications and for natural adversary scenarios. However, we analyze the possible digital evasion attack mechanisms and show that in some important cases, when the pattern (image) has been attacked, correctly classifying it has no utility---when the image to be attacked is (even arbitrarily) selected from the attacker's cache and when the sole recipient of the classifier's decision is the attacker. Moreover, in some application domains and scenarios, it is highly actionable to detect the attack irrespective of correctly classifying in the face of it (with classification still performed if no attack is detected). We hypothesize that adversarial perturbations are machine detectable even if they are small. We propose a purely unsupervised anomaly detector (AD) that, unlike previous works, (1) models the joint density of a deep layer using highly suitable null hypothesis density models (matched in particular to the nonnegative support for rectified linear unit (ReLU) layers); (2) exploits multiple DNN layers; and (3) leverages a source and destination class concept, source class uncertainty, the class confusion matrix, and DNN weight information in constructing a novel decision statistic grounded in the Kullback-Leibler divergence. Tested on MNIST and CIFAR image databases under three prominent attack strategies, our approach outperforms previous detection methods, achieving strong receiver operating characteristic area under the curve detection accuracy on two attacks and better accuracy than recently reported for a variety of methods on the strongest (CW) attack. We also evaluate a fully white box attack on our system and demonstrate that our method can be leveraged to strong effect in detecting reverse engineering attacks. Finally, we evaluate other important performance measures such as classification accuracy versus true detection rate and multiple measures versus attack strength.

[1]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[2]  Jenq-Neng Hwang,et al.  Solving inverse problems by Bayesian neural network iterative inversion with ground truth incorporation , 1997, IEEE Trans. Signal Process..

[3]  Terrance E. Boult,et al.  Assessing Threat of Adversarial Examples on Deep Neural Networks , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[4]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[5]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[6]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bob L. Sturm,et al.  Deep learning, audio adversaries, and music content analysis , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[9]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[10]  Sridha Sridharan,et al.  Detecting rare events using Kullback-Leibler divergence: A weakly supervised approach , 2016, Expert Syst. Appl..

[11]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[12]  Wenbo Guo,et al.  Adversary Resistant Deep Neural Networks with an Application to Malware Detection , 2016, KDD.

[13]  S. Sinanovic,et al.  Anomaly detection using the Kullback-Leibler divergence metric , 2008, 2008 First International Symposium on Applied Sciences on Biomedical and Communication Technologies.

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  Kevin Gimpel,et al.  Early Methods for Detecting Adversarial Images , 2016, ICLR.

[16]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[17]  David A. Forsyth,et al.  SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  George Kesidis,et al.  When Not to Classify: Detection of Reverse Engineering Attacks on DNN Image Classifiers , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[21]  Fabio Roli,et al.  Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection , 2017, IEEE Transactions on Dependable and Secure Computing.

[22]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[23]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[24]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25]  George Kesidis,et al.  A Maximum Entropy Framework for Semisupervised and Active Learning With Unknown and Label-Scarce Classes , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[26]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[27]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[30]  Asok Ray,et al.  A Locally Optimal Algorithm for Estimating a Generating Partition from an Observed Time Series and Its Application to Anomaly Detection , 2018, Neural Computation.

[31]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[32]  Valentina Zantedeschi,et al.  Efficient Defenses Against Adversarial Attacks , 2017, AISec@CCS.

[33]  George Kesidis,et al.  Adversarial learning: A critical review and active learning study , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[34]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[35]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[37]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[38]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[39]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[40]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[41]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[42]  Ling Huang,et al.  Adversarial Active Learning , 2014, AISec '14.