A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples

Most machine learning classifiers, including deep neural networks, are vulnerable to adversarial examples. Such inputs are typically generated by adding small but purposeful modifications that lead to incorrect outputs while imperceptible to human eyes. The goal of this paper is not to introduce a single method, but to make theoretical steps towards fully understanding adversarial examples. By using concepts from topology, our theoretical analysis brings forth the key reasons why an adversarial example can fool a classifier ($f_1$) and adds its oracle ($f_2$, like human eyes) in such analysis. By investigating the topological relationship between two (pseudo)metric spaces corresponding to predictor $f_1$ and oracle $f_2$, we develop necessary and sufficient conditions that can determine if $f_1$ is always robust (strong-robust) against adversarial examples according to $f_2$. Interestingly our theorems indicate that just one unnecessary feature can make $f_1$ not strong-robust, and the right feature representation learning is the key to getting a classifier that is both accurate and strong-robust.

[1]  K O Johnson,et al.  Sensory discrimination: decision process. , 1980, Journal of neurophysiology.

[2]  Gerald B. Folland,et al.  Real Analysis: Modern Techniques and Their Applications , 1984 .

[3]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[4]  R. Dahlhaus Fitting time series models to nonstationary processes , 1997 .

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[8]  William T. B. Uther,et al.  Adversarial Reinforcement Learning , 2003 .

[9]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[10]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[11]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[12]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[14]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[15]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[16]  David D. Cox,et al.  Untangling invariant object recognition , 2007, Trends in Cognitive Sciences.

[17]  Fabio Roli,et al.  Adversarial Pattern Classification Using Multiple Classifiers and Randomisation , 2008, SSPR/SPR.

[18]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[19]  Ohad Shamir,et al.  Learning to classify with missing and corrupted features , 2008, ICML '08.

[20]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[23]  Sanjay Chawla,et al.  Mining adversarial patterns via regularized loss minimization , 2010, Machine Learning.

[24]  Cynthia Dwork,et al.  Differential Privacy , 2006, Encyclopedia of Cryptography and Security.

[25]  Tobias Scheffer,et al.  Stackelberg games for adversarial prediction problems , 2011, KDD.

[26]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[27]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[28]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[29]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Arun Rajkumar,et al.  A Differentially Private Stochastic Gradient Descent Algorithm for Multiparty Classification , 2012, AISTATS.

[32]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[33]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Ashwin Machanavajjhala,et al.  Differentially Private Algorithms for Empirical Machine Learning , 2014, ArXiv.

[35]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[36]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[37]  Pengtao Xie,et al.  Crypto-Nets: Neural Networks over Encrypted Data , 2014, ArXiv.

[38]  Yevgeniy Vorobeychik,et al.  Feature Cross-Substitution in Adversarial Classification , 2014, NIPS.

[39]  Yann LeCun,et al.  Differentially- and non-differentially-private random decision trees , 2014, ArXiv.

[40]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[41]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[42]  Fabio Roli,et al.  Poisoning Complete-Linkage Hierarchical Clustering , 2014, S+SSPR.

[43]  Pascal Frossard,et al.  Fundamental limits on adversarial robustness , 2015, ICML 2015.

[44]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[45]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Xiaojin Zhu,et al.  The Security of Latent Dirichlet Allocation , 2015, AISTATS.

[47]  Xiaojin Zhu,et al.  Some Submodular Data-Poisoning Attacks on Machine Learners , 2015 .

[48]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[49]  Richard Nock,et al.  Rademacher Observations, Private Data, and Boosting , 2015, ICML.

[50]  Eugenio Culurciello,et al.  Robust Convolutional Neural Networks under Adversarial Noise , 2015, ArXiv.

[51]  Sungroh Yoon,et al.  Manifold Regularized Deep Neural Networks using Adversarial Examples , 2015, ArXiv.

[52]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[53]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[54]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[55]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[56]  Milind Tambe,et al.  Learning Adversary Behavior in Security Games: A PAC Model Perspective , 2015, AAMAS.

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Patrick P. K. Chan,et al.  Adversarial Feature Selection Against Evasion Attacks , 2016, IEEE Transactions on Cybernetics.

[59]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[61]  J. Doug Tygar,et al.  Evasion and Hardening of Tree Ensemble Classifiers , 2015, ICML.

[62]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[63]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[64]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[65]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[66]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[67]  Patrick D. McDaniel,et al.  Adversarial Perturbations Against Deep Neural Networks for Malware Classification , 2016, ArXiv.

[68]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[70]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[71]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[72]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Beilun Wang,et al.  DeepMask: Masking DNN Models for robustness against adversarial samples , 2017, ArXiv.

[74]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[75]  Nitakshi Goyal,et al.  General Topology-I , 2017 .

[76]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[77]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[78]  Qian Wang,et al.  Differentially Private Distributed Online Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.