Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations

We consider ReLU networks with random weights, in which the dimension decreases at each layer. We show that for most such networks, most examples $x$ admit an adversarial perturbation at an Euclidean distance of $O\left(\frac{\|x\|}{\sqrt{d}}\right)$, where $d$ is the input dimension. Moreover, this perturbation can be found via gradient flow, as well as gradient descent with sufficiently small steps. This result can be seen as an explanation to the abundance of adversarial examples, and to the fact that they are found via gradient descent.

[1]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[2]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[5]  R. Handel Probability in High Dimension , 2014 .

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  Adi Shamir,et al.  A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance , 2019, ArXiv.

[8]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[9]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[10]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[11]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[12]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[13]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[14]  Ilya P. Razenshteyn,et al.  Adversarial examples from computational constraints , 2018, ICML.

[15]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[16]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[17]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[18]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[19]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[20]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.