论文信息 - Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples

Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples

A large body of recent work has investigated the phenomenon of evasion attacks using adversarial examples for deep learning systems, where the addition of norm-bounded perturbations to the test inputs leads to incorrect output classification. Previous work has investigated this phenomenon in closed-world systems where training and test inputs follow a pre-specified distribution. However, real-world implementations of deep learning applications, such as autonomous driving and content classification are likely to operate in the open-world environment. In this paper, we demonstrate the success of open-world evasion attacks, where adversarial examples are generated from out-of-distribution inputs (OOD adversarial examples). In our study, we use 11 state-of-the-art neural network models trained on 3 image datasets of varying complexity. We first demonstrate that state-of-the-art detectors for out-of-distribution data are not robust against OOD adversarial examples. We then consider 5 known defenses for adversarial examples, including state-of-the-art robust training methods, and show that against these defenses, OOD adversarial examples can achieve up to 4$\times$ higher target success rates compared to adversarial examples generated from in-distribution data. We also take a quantitative look at how open-world evasion attacks may affect real-world systems. Finally, we present the first steps towards a robust open-world machine learning system.

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Michael P. Wellman,et al. Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[3] Timothy A. Mann,et al. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[4] Mykel J. Kochenderfer,et al. Policy compression for aircraft collision avoidance systems , 2016, 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC).

[5] Philip H. S. Torr,et al. On the Robustness of Semantic Segmentation Models to Adversarial Attacks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[7] Thomas G. Dietterich,et al. Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[8] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[10] Dawn Xiaodong Song,et al. Practical Black-Box Attacks on Deep Neural Networks Using Efficient Query Mechanisms , 2018, ECCV.

[11] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[12] Pushmeet Kohli,et al. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[13] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[15] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[16] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[17] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[22] Daniel Cullina,et al. Enhancing robustness of machine learning systems via data transformations , 2017, 2018 52nd Annual Conference on Information Sciences and Systems (CISS).

[23] Yue Zhao,et al. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition , 2018, USENIX Security Symposium.

[24] Chang Liu,et al. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[25] Lujo Bauer,et al. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[26] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[27] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[28] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Junfeng Yang,et al. Efficient Formal Safety Analysis of Neural Networks , 2018, NeurIPS.

[30] Jinfeng Yi,et al. EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[31] Sanjay Chawla,et al. Anomaly Detection using One-Class Neural Networks , 2018, ArXiv.

[32] Yanjun Qi,et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[33] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35] Masakazu Iwamura,et al. ShakeDrop regularization , 2018, ICLR.

[36] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[37] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Terrance E. Boult,et al. Towards Open World Recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[40] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[41] Ananthram Swami,et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[42] Prateek Mittal,et al. Rogue Signs: Deceiving Traffic Sign Recognition with Malicious Ads and Logos , 2018, ArXiv.

[43] Logan Engstrom,et al. Synthesizing Robust Adversarial Examples , 2017, ICML.

[44] Ran El-Yaniv,et al. Selective Classification for Deep Neural Networks , 2017, NIPS.

[45] David A. Wagner,et al. Background Class Defense Against Adversarial Examples , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[46] David A. Wagner,et al. MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples , 2017, ArXiv.

[47] Ben Y. Zhao,et al. With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning , 2018, USENIX Security Symposium.

[48] Nina Narodytska,et al. Simple Black-Box Adversarial Perturbations for Deep Networks , 2016, ArXiv.

[49] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[50] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[51] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[52] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[53] Tudor Dumitras,et al. When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks , 2018, USENIX Security Symposium.

[54] Terrance E. Boult,et al. Reducing Network Agnostophobia , 2018, NeurIPS.

[55] Jiajun Lu,et al. Adversarial Examples that Fool Detectors , 2017, ArXiv.

[56] Thomas Brox,et al. Adversarial Examples for Semantic Image Segmentation , 2017, ICLR.

[57] Alexander Binder,et al. Deep One-Class Classification , 2018, ICML.

[58] Fabio Roli,et al. Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[59] Michael Kaminsky,et al. SybilGuard: Defending Against Sybil Attacks via Social Networks , 2008, IEEE/ACM Transactions on Networking.

[60] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[61] Matthias Bethge,et al. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[62] R. Srikant,et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[63] Patrick D. McDaniel,et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[64] Junfeng Yang,et al. Formal Security Analysis of Neural Networks using Symbolic Intervals , 2018, USENIX Security Symposium.

[65] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[66] Kibok Lee,et al. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[67] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[68] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[69] Hung Dang,et al. Evading Classifiers by Morphing in the Dark , 2017, CCS.

[70] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[71] Ling Huang,et al. Stealthy poisoning attacks on PCA-based anomaly detectors , 2009, SIGMETRICS Perform. Evaluation Rev..

[72] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Dawn Xiaodong Song,et al. Adversarial Examples for Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[74] Yizheng Chen,et al. MixTrain: Scalable Training of Formally Robust Neural Networks , 2018, ArXiv.

[75] Atul Prakash,et al. Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[76] Aleksander Madry,et al. Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability , 2018, ICLR.

[77] George Danezis,et al. SybilInfer: Detecting Sybil Nodes using Social Networks , 2009, NDSS.

[78] Terrance E. Boult,et al. Toward Open-Set Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[79] Thomas G. Dietterich,et al. Open Category Detection with PAC Guarantees , 2018, ICML.

[80] J. Doug Tygar,et al. Evasion and Hardening of Tree Ensemble Classifiers , 2015, ICML.

[81] Kibok Lee,et al. Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[82] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[83] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[84] Swarat Chaudhuri,et al. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[85] Susmita Sur-Kolay,et al. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare , 2015, IEEE Journal of Biomedical and Health Informatics.

[86] Matthew Johnson-Roberson,et al. Failing to Learn: Autonomously Identifying Perception Failures for Self-Driving Cars , 2017, IEEE Robotics and Automation Letters.

[87] Dawn Xiaodong Song,et al. Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[88] Ran El-Yaniv,et al. On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[89] Prateek Mittal,et al. Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers , 2017, ArXiv.

[90] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[91] Graham W. Taylor,et al. Learning Confidence for Out-of-Distribution Detection in Neural Networks , 2018, ArXiv.

[92] Patrick D. McDaniel,et al. On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[93] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[94] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[95] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[96] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[97] Patrick D. McDaniel,et al. Adversarial Examples for Malware Detection , 2017, ESORICS.

[98] Logan Engstrom,et al. Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[99] J. Zico Kolter,et al. Scaling provable adversarial defenses , 2018, NeurIPS.

[100] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[101] Aditya Krishna Menon,et al. A loss framework for calibrated anomaly detection , 2018, NeurIPS.

[102] Sanjay Chawla,et al. Deep Learning for Anomaly Detection: A Survey , 2019, ArXiv.

[103] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[104] Bram van Ginneken,et al. A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[105] Duen Horng Chau,et al. ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector , 2018, ECML/PKDD.

[106] Bo Zong,et al. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[107] Alan L. Yuille,et al. Mitigating adversarial effects through randomization , 2017, ICLR.

[108] Moustapha Cissé,et al. Houdini: Fooling Deep Structured Prediction Models , 2017, ArXiv.

[109] Alan L. Yuille,et al. Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[110] Harini Kannan,et al. Adversarial Logit Pairing , 2018, NIPS 2018.

[111] David A. Wagner,et al. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[112] Blaine Nelson,et al. Poisoning Attacks against Support Vector Machines , 2012, ICML.

[113] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[114] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[115] Maya R. Gupta,et al. To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[116] J. Doug Tygar,et al. Adversarial machine learning , 2019, AISec '11.

[117] Sandy H. Huang,et al. Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[118] Richard Lippmann,et al. Figure of Merit Training for Detection and Spotting , 1993, NIPS.

[119] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[120] Dawn Xiaodong Song,et al. Delving into adversarial attacks on deep policies , 2017, ICLR.

[121] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[122] Takeshi Naemura,et al. Classification-Reconstruction Learning for Open-Set Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[123] Terrance E. Boult,et al. Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[124] Logan Engstrom,et al. Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.

[125] Rüdiger Ehlers,et al. Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[126] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[127] Inderjit S. Dhillon,et al. Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.