Defending Against Adversarial Samples Without Security through Obscurity

It has been recently shown that deep neural networks (DNNs) are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points that change "fool" the original DNN model, forcing it to misclassify previously correctly classified samples with high confidence. Many believe addressing this flaw is essential for DNNs to be used in critical applications such as cyber security. Previous work has shown that learning algorithms that enhance the robustness of DNN models all use the tactic of "security through obscurity". This means that security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate by examining how previous research dealt with this and propose a generic approach to enhance a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available, such as a large scale malware dataset and MNIST and IMDB datasets. Our results indicate that our approach typically provides superior classification performance and robustness to attacks compared with state-of-art solutions.

[1]  Somesh Jha,et al.  Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training , 2017, ICML.

[2]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[3]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[4]  Kevin W. Hamlen,et al.  Frankenstein: Stitching Malware from Benign Binaries , 2012, WOOT.

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[8]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[9]  Daniel Kifer,et al.  Unifying Adversarial Training Algorithms with Flexible Deep Data Gradient Regularization , 2016, ArXiv.

[10]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[11]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[12]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[13]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[15]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[16]  Wenbo Guo,et al.  Adversary Resistant Deep Neural Networks with an Application to Malware Detection , 2016, KDD.

[17]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[18]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[19]  Antonio Criminisi,et al.  Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[22]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[23]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[24]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[25]  Prateek Mittal,et al.  Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers , 2017, ArXiv.

[26]  Xiaoyu Cao,et al.  Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification , 2017, ACSAC.

[27]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[28]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[29]  Junfeng Yang,et al.  DeepXplore , 2019, Commun. ACM.

[30]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[31]  David Slater,et al.  Malicious Behavior Detection using Windows Audit Logs , 2015, AISec@CCS.

[32]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[33]  Trevor Darrell,et al.  Fooling Vision and Language Models Despite Localization and Attention Mechanism , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).