Toward Robust Neural Networks via Sparsification

It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks. In this paper, we make the case that a systematic exploitation of sparsity is key to defending against such attacks, and that a "locally linear" model for neural networks can be used to develop a theoretical foundation for crafting attacks and defenses. We consider two defenses. The first is a sparsifying front end, which attenuates the impact of the attack by a factor of roughly $K/N$ where $N$ is the data dimension and $K$ is the sparsity level. The second is sparsification of network weights, which attenuates the worst-case growth of an attack as it flows up the network. We also devise attacks based on the locally linear model that outperform the well-known FGSM attack. We provide experimental results for the MNIST and Fashion-MNIST datasets, showing the efficacy of the proposed sparsity-based defenses.

[1]  Brendan J. Frey,et al.  k-Sparse Autoencoders , 2013, ICLR.

[2]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[3]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[4]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[5]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[6]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[7]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[8]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Dmitry M. Malioutov,et al.  A sparse signal reconstruction perspective for source localization with sensor arrays , 2005, IEEE Transactions on Signal Processing.

[10]  Mike E. Davies,et al.  Gradient Pursuits , 2008, IEEE Transactions on Signal Processing.

[11]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[12]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[13]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[14]  I. Daubechies,et al.  Biorthogonal bases of compactly supported wavelets , 1992 .

[15]  Kjersti Engan,et al.  Recursive Least Squares Dictionary Learning Algorithm , 2010, IEEE Transactions on Signal Processing.

[16]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[17]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[18]  Seyed-Mohsen Moosavi-Dezfooli,et al.  The Robustness of Deep Networks: A Geometrical Perspective , 2017, IEEE Signal Processing Magazine.

[19]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[20]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[21]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[22]  S. Mallat A wavelet tour of signal processing , 1998 .

[23]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[24]  D. Donoho,et al.  Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA) , 2005 .

[25]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[26]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[27]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Daniel Cullina,et al.  Enhancing robustness of machine learning systems via data transformations , 2017, 2018 52nd Annual Conference on Information Sciences and Systems (CISS).

[30]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[31]  Charu C. Aggarwal,et al.  Neural Networks and Deep Learning , 2018, Springer International Publishing.

[32]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[33]  Pascal Frossard,et al.  Classification regions of deep neural networks , 2017, ArXiv.

[34]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[35]  Upamanyu Madhow,et al.  Sparsity-based Defense Against Adversarial Attacks on Linear Classifiers , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[36]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[37]  Yao Zhao,et al.  Adversarial Attacks and Defences Competition , 2018, ArXiv.

[38]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[39]  Xiaolin Hu,et al.  Discovering Adversarial Examples with Momentum , 2017, ArXiv.

[40]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[41]  Li Chen,et al.  Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression , 2017, ArXiv.

[42]  Surya Ganguli,et al.  Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.

[43]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[45]  Michael Elad,et al.  Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model , 2013, IEEE Transactions on Signal Processing.

[46]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.