MixDefense: A Defense-in-Depth Framework for Adversarial Example Detection Based on Statistical and Semantic Analysis

Machine learning with deep neural networks (DNNs) has become one of the foundation techniques in many safety-critical systems, such as autonomous vehicles and medical diagnosis systems. DNN-based systems, however, are known to be vulnerable to adversarial examples (AEs) that are maliciously perturbed variants of legitimate inputs. While there has been a vast body of research to defend against AE attacks in the literature, the performances of existing defense techniques are still far from satisfactory, especially for adaptive attacks, wherein attackers are knowledgeable about the defense mechanisms and craft AEs accordingly. In this work, we propose a multilayer defense-in-depth framework for AE detection, namely MixDefense. For the first layer, we focus on those AEs with large perturbations. We propose to leverage the ‘noise’ features extracted from the inputs to discover the statistical difference between natural images and tampered ones for AE detection. For AEs with small perturbations, the inference result of such inputs would largely deviate from their semantic information. Consequently, we propose a novel learningbased solution to model such contradictions for AE detection. Both layers are resilient to adaptive attacks because there do not exist gradient propagation paths for AE generation. Experimental results with various AE attack methods on image classification datasets show that the proposed MixDefense solution outperforms the existing AE detection techniques by a considerable margin.

[1]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[2]  Vincent Cheval,et al.  DEEPSEC: Deciding Equivalence Properties in Security Protocols Theory and Practice , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[3]  John Duchi,et al.  Understanding and Mitigating the Tradeoff Between Robustness and Accuracy , 2020, ICML.

[4]  Eduardo Valle,et al.  Adversarial Attacks on Variational Autoencoders , 2018, LatinX in AI at Neural Information Processing Systems Conference 2018.

[5]  Siwei Lyu,et al.  Detecting Hidden Messages Using Higher-Order Statistics and Support Vector Machines , 2002, Information Hiding.

[6]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[7]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[8]  Jonas Mueller,et al.  Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks , 2021, NeurIPS Datasets and Benchmarks.

[9]  Zoubin Ghahramani,et al.  A study of the effect of JPG compression on adversarial images , 2016, ArXiv.

[10]  Zhihao Zheng,et al.  Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks , 2018, NeurIPS.

[11]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[12]  Nicholas Carlini,et al.  Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations , 2020, ICML.

[13]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[14]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[15]  Pramod K. Varshney,et al.  Anomalous Example Detection in Deep Learning: A Survey , 2020, IEEE Access.

[16]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[17]  Dongdong Hou,et al.  Detection Based Defense Against Adversarial Examples From the Steganalysis Point of View , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Roberto Caldelli,et al.  Adversarial Examples Detection in Features Distance Spaces , 2018, ECCV Workshops.

[19]  Wei Su,et al.  Steganalysis based on Markov Model of Thresholded Prediction-Error Image , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[20]  Tony R. Martinez,et al.  Improving classification accuracy by identifying and removing instances that should be misclassified , 2011, The 2011 International Joint Conference on Neural Networks.

[21]  Samuel Henrique Silva,et al.  Opportunities and Challenges in Deep Learning Adversarial Robustness: A Survey , 2020, ArXiv.

[22]  Jessica J. Fridrich,et al.  New blind steganalysis and its implications , 2006, Electronic Imaging.

[23]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[24]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[25]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[26]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[27]  Xiaofeng Wang,et al.  Detecting Adversarial Image Examples in Deep Neural Networks with Adaptive Noise Reduction , 2017, IEEE Transactions on Dependable and Secure Computing.

[28]  Ian J. Goodfellow,et al.  Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[29]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[30]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[32]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[33]  George Kesidis,et al.  When Not to Classify: Anomaly Detection of Attacks (ADA) on DNN Classifiers at Test Time , 2017, Neural Computation.

[34]  Wen-Chuan Lee,et al.  NIC: Detecting Adversarial Samples with Neural Network Invariant Checking , 2019, NDSS.

[35]  Changshui Zhang,et al.  Deep Defense: Training DNNs with Improved Adversarial Robustness , 2018, NeurIPS.

[36]  Ian S. Fischer,et al.  Adversarial Transformation Networks: Learning to Generate Adversarial Examples , 2017, ArXiv.

[37]  Ser-Nam Lim,et al.  PyTorch Metric Learning , 2020, ArXiv.

[38]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[39]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[40]  Cho-Jui Hsieh,et al.  Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[41]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[42]  Dawn Xiaodong Song,et al.  Adversarial Examples for Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[43]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Tomás Pevný,et al.  Steganalysis by Subtractive Pixel Adjacency Matrix , 2009, IEEE Transactions on Information Forensics and Security.

[45]  Matthias Bethge,et al.  Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX , 2020, J. Open Source Softw..

[46]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[47]  Yuchen Zhang,et al.  Defending against Whitebox Adversarial Attacks via Randomized Discretization , 2019, AISTATS.

[48]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[49]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[50]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[51]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[52]  Insup Lee,et al.  VisionGuard: Runtime Detection of Adversarial Inputs to Perception Systems , 2020, ArXiv.

[53]  Marcin Detyniecki,et al.  Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection , 2019, ArXiv.

[54]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[55]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[56]  Li Chen,et al.  SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression , 2018, KDD.

[57]  Xiapu Luo,et al.  A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models , 2019, CCS.

[58]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[59]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[60]  Ben Y. Zhao,et al.  Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks , 2019, CCS.

[61]  Jessica J. Fridrich,et al.  Rich Models for Steganalysis of Digital Images , 2012, IEEE Transactions on Information Forensics and Security.

[62]  Jingyi Wang,et al.  Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[63]  Ting Wang,et al.  DEEPSEC: A Uniform Platform for Security Analysis of Deep Learning Model , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[64]  Ning Chen,et al.  Improving Adversarial Robustness via Promoting Ensemble Diversity , 2019, ICML.

[65]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[66]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.