Feature Purification: How Adversarial Training Performs Robust Deep Learning

Despite the empirical success of using adversarial training to defend deep learning models against adversarial perturbations, so far, it still remains rather unclear what the principles are behind the existence of adversarial perturbations, and what adversarial training does to the neural network to remove them. In this paper, we present a principle that we call feature purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network; and more importantly, one of the goals of adversarial training is to remove such mixtures to purify hidden weights. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly initialized gradient descent indeed satisfies this principle. Technically, we give, to the best of our knowledge, the first re-sult proving that the following two can hold simultaneously for training a neural network with ReLU activation. (1) Training over the original data is indeed non-robust to small adversarial perturbations of some radius. (2) Adversarial training, even with an empirical perturbation algorithm such as FGM, can in fact be provably robust against any perturbations of the same radius. Finally, we also prove a complexity lower bound, showing that low complexity models such as linear classifiers, low-degree polynomials, or even the neural tangent kernel for this network, cannot defend against perturbations of this same radius, no matter what algorithms are used to train them. 11The full version of this paper can be found at https://arxiv.org/abs/-2005.l0190.

[1]  P. Erdös On a lemma of Littlewood and Offord , 1945 .

[2]  Jehoshua Bruck,et al.  Polynomial threshold functions, AC functions and spectrum norms , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[3]  Richard Beigel,et al.  The polynomial method in circuit complexity , 1993, [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference.

[4]  Nathan Linial,et al.  Spectral properties of threshold functions , 1994, Comb..

[5]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[6]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[7]  J L Gallant,et al.  Sparse coding and decorrelation in primary visual cortex during natural vision. , 2000, Science.

[8]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[9]  Ryan O'Donnell,et al.  New degree bounds for polynomial threshold functions , 2003, STOC '03.

[10]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[11]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[12]  Alexander A. Razborov,et al.  The Sign-Rank of AC^O , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[15]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[16]  Ryan O'Donnell,et al.  New degree bounds for polynomial threshold functions , 2010, Comb..

[17]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[18]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[19]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[20]  Huan Wang,et al.  Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[23]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Anima Anandkumar,et al.  Exact Recovery of Sparsely Used Overcomplete Dictionaries , 2013, ArXiv.

[25]  Karin Schnass,et al.  On the Identifiability of Overcomplete Dictionaries via the Minimisation Principle Underlying K-SVD , 2013, ArXiv.

[26]  Huan Wang,et al.  On the local correctness of ℓ1-minimization for dictionary learning , 2011, 2014 IEEE International Symposium on Information Theory.

[27]  Sanjeev Arora,et al.  New Algorithms for Learning Incoherent and Overcomplete Dictionaries , 2013, COLT.

[28]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[29]  John Wright,et al.  Complete dictionary recovery over the sphere , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).

[30]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Sanjeev Arora,et al.  Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[34]  David Steurer,et al.  Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[35]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[36]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[37]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[38]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[39]  Le Song,et al.  Diversity Leads to Generalization in Neural Networks , 2016, ArXiv.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yuanzhi Li,et al.  Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates , 2016, NIPS.

[42]  Sanjeev Arora,et al.  A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[43]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[44]  Daniel Soudry,et al.  No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.

[45]  Yoram Singer,et al.  Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.

[46]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization , 2013, SIAM J. Optim..

[47]  Justin Thaler,et al.  Improved Bounds on the Sign-Rank of AC^0 , 2016, ICALP.

[48]  Lewis D. Griffin,et al.  A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples , 2016, ArXiv.

[49]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[50]  Yoshua Bengio,et al.  Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yuandong Tian,et al.  An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.

[52]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[53]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[54]  Guanghui Lan,et al.  Theoretical properties of the global optimizer of two layer neural network , 2017, ArXiv.

[55]  Yuanzhi Li,et al.  Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations , 2017, ICML.

[56]  Le Song,et al.  Diverse Neural Network Learns True Target Functions , 2016, AISTATS.

[57]  Amir Globerson,et al.  Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.

[58]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[59]  Cho-Jui Hsieh,et al.  Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[60]  Sanjeev Arora,et al.  Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.

[61]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[62]  Santosh S. Vempala,et al.  Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks , 2018, ArXiv.

[63]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[64]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[65]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[66]  Hongyang Zhang,et al.  Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.

[67]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[68]  Shai Shalev-Shwartz,et al.  SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.

[69]  Yuanzhi Li,et al.  Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[70]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[71]  Martin Wattenberg,et al.  Adversarial Spheres , 2018, ICLR.

[72]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[73]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[74]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[75]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[76]  Quanquan Gu,et al.  An Improved Analysis of Training Over-parameterized Deep Neural Networks , 2019, NeurIPS.

[77]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[78]  Ruosong Wang,et al.  Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[79]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[80]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[81]  Pradeep Ravikumar,et al.  Revisiting Adversarial Risk , 2018, AISTATS.

[82]  David P. Woodruff,et al.  Learning Two Layer Rectified Neural Networks in Polynomial Time , 2018, COLT.

[83]  Yuan Cao,et al.  Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.

[84]  Colin Wei,et al.  Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.

[85]  Xiao Zhang,et al.  Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.

[86]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[87]  Yuan Cao,et al.  Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.

[88]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[89]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[90]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[91]  John Wilmes,et al.  Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds , 2018, COLT.

[92]  Saeed Mahloujifar,et al.  The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure , 2018, AAAI.

[93]  Bernt Schiele,et al.  Disentangling Adversarial Robustness and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[94]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[95]  Greg Yang,et al.  Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.

[96]  Andrea Montanari,et al.  Linearized two-layers neural networks in high dimension , 2019, The Annals of Statistics.

[97]  Yuanzhi Li,et al.  Can SGD Learn Recurrent Neural Networks with Provable Generalization? , 2019, NeurIPS.

[98]  Ilya P. Razenshteyn,et al.  Adversarial examples from computational constraints , 2018, ICML.

[99]  Aditi Raghunathan,et al.  Adversarial Training Can Hurt Generalization , 2019, ArXiv.

[100]  Yuanzhi Li,et al.  On the Convergence Rate of Training Recurrent Neural Networks , 2018, NeurIPS.

[101]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[102]  Yuanzhi Li,et al.  What Can ResNet Learn Efficiently, Going Beyond Kernels? , 2019, NeurIPS.

[103]  James Bailey,et al.  On the Convergence and Robustness of Adversarial Training , 2021, ICML.

[104]  Adel Javanmard,et al.  Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.

[105]  Cho-Jui Hsieh,et al.  Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.

[106]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[107]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[108]  Boris Hanin,et al.  Finite Depth and Width Corrections to the Neural Tangent Kernel , 2019, ICLR.

[109]  Yuanzhi Li,et al.  Backward Feature Correction: How Deep Learning Performs Deep Learning , 2020, ArXiv.

[110]  Yuanzhi Li,et al.  When can Wasserstein GANs minimize Wasserstein Distance? , 2020, ArXiv.

[111]  Yuanzhi Li,et al.  Making Method of Moments Great Again? -- How can GANs learn the target distribution , 2020 .

[112]  Zhao Song,et al.  Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality , 2020, NeurIPS.

[113]  Eric P. Xing,et al.  High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[114]  Adel Javanmard,et al.  Precise Tradeoffs in Adversarial Training for Linear Regression , 2020, COLT.

[115]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[116]  Yuanzhi Li,et al.  Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK , 2020, COLT 2020.