Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection

Although deep neural networks have shown promising performances on various tasks, they are susceptible to incorrect predictions induced by imperceptibly small perturbations in inputs. A large number of previous works proposed to detect adversarial attacks. Yet, most of them cannot effectively detect them against adaptive whitebox attacks where an adversary has the knowledge of the model and the defense method. In this paper, we propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. We consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.

[1]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[2]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[3]  Clark W. Barrett,et al.  Provably Minimally-Distorted Adversarial Examples , 2017 .

[4]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[5]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[6]  Aleksander Madry,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[7]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[8]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[9]  Richard Zemel,et al.  Conditional Generative Models are not Robust , 2019, ArXiv.

[10]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Geoffrey E. Hinton,et al.  Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions , 2019, ICLR.

[13]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[14]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[15]  Thomas Hofmann,et al.  The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[16]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[17]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[18]  Ning Chen,et al.  Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[19]  Radha Poovendran,et al.  Are Odds Really Odd? Bypassing Statistical Detection of Adversarial Examples , 2019, ArXiv.

[20]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[21]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[22]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[26]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[27]  Alessio Lomuscio,et al.  An approach to reachability analysis for feed-forward ReLU neural networks , 2017, ArXiv.

[28]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[31]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[32]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[33]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[34]  Michael J. Black,et al.  Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders , 2018, AAAI.

[35]  Pradeep Ravikumar,et al.  MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius , 2020, ICLR.

[36]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[37]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[38]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[39]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[40]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[41]  Jiliang Tang,et al.  Adversarial Attacks and Defenses in Images, Graphs and Text: A Review , 2019, International Journal of Automation and Computing.

[42]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[43]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[44]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[45]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[46]  Timothy A. Mann,et al.  On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[47]  Zhitao Gong,et al.  Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.

[48]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[49]  Jingyi Wang,et al.  Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[50]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[51]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[52]  Christian Gagné,et al.  Controlling Over-generalization and its Effect on Adversarial Examples Generation and Detection , 2018, ArXiv.

[53]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[54]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[55]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[56]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[57]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[58]  Matteo Fischetti,et al.  Deep neural networks and mixed integer linear optimization , 2018, Constraints.

[59]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Yingzhen Li,et al.  Are Generative Classifiers More Robust to Adversarial Attacks? , 2018, ICML.

[62]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[63]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[64]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[65]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[66]  Chih-Hong Cheng,et al.  Maximum Resilience of Artificial Neural Networks , 2017, ATVA.

[67]  Bernhard Pfahringer,et al.  Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[68]  Pushmeet Kohli,et al.  A Unified View of Piecewise Linear Neural Network Verification , 2017, NeurIPS.

[69]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[70]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[71]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[72]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[73]  Ashish Tiwari,et al.  Output Range Analysis for Deep Neural Networks , 2017, ArXiv.

[74]  Rüdiger Ehlers,et al.  Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[75]  Wen-Chuan Lee,et al.  NIC: Detecting Adversarial Samples with Neural Network Invariant Checking , 2019, NDSS.