The Dimpled Manifold Model of Adversarial Examples in Machine Learning

The extreme fragility of deep neural networks when presented with tiny perturbations in their inputs was independently discovered by several research groups in 2013, but in spite of enormous effort these adversarial examples remained a baffling phenomenon with no clear explanation. In this paper we introduce a new conceptual framework (which we call the Dimpled Manifold Model) which provides a simple explanation for why adversarial examples exist, why their perturbations have such tiny norms, why these perturbations look like random noise, and why a network which was adversarially trained with incorrectly labeled images can still correctly classify test images. In the last part of the paper we describe the results of numerous experiments which strongly support this new model, and in particular our assertion that adversarial perturbations are roughly perpendicular to the low dimensional manifold which contains all the training examples.

[1]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[2]  Micah Goldblum,et al.  The Intrinsic Dimension of Images and Its Impact on Learning , 2021, ICLR.

[3]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[4]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[5]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[10]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[11]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[12]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[13]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[14]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[15]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[16]  Bo Sun,et al.  Adversarial Defense by Stratified Convolutional Sparse Coding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Wei Wang,et al.  Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.