Origins of Low-dimensional Adversarial Perturbations

In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs) in classification. Unlike the classical setting, these perturbations are limited to a subspace of dimension k which is much smaller than the dimension d of the feature space. The case k = 1 corresponds to so-called universal adversarial perturbations (UAPs; Moosavi-Dezfooli et al., 2017). First, we consider binary classifiers under generic regularity conditions (including ReLU networks) and compute analytical lower-bounds for the fooling rate of any subspace. These bounds explicitly highlight the dependence of the fooling rate on the pointwise margin of the model (i.e., the ratio of the output to its L 2 norm of its gradient at a test point), and on the alignment of the given subspace with the gradients of the model w.r.t.inputs. Our results provide a rigorous explanation for the recent success of heuristic methods for efficiently generating low-dimensional adversarial perturbations. Finally, we show that if a decision-region is compact, then it admits a universal adversarial perturbation with L 2 norm which is √ d times smaller than the typical L 2 norm of a data point. Our theoretical results are confirmed by experiments on both synthetic and real data. Contents

[1]  Peter L. Bartlett,et al.  Adversarial Examples in Multi-Layer Random ReLU Networks , 2021, NeurIPS.

[2]  Sébastien Bubeck,et al.  A Universal Law of Robustness via Isoperimetry , 2021, NeurIPS.

[3]  Sébastien Bubeck,et al.  A single gradient step finds adversarial examples on random two-layers neural networks , 2021, NeurIPS.

[4]  Sébastien Bubeck,et al.  A law of robustness for two-layers neural networks , 2020, COLT.

[5]  Amit Daniely,et al.  Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations , 2020, NeurIPS.

[6]  Tong Zhang,et al.  Black-Box Adversarial Attack with Transferable Model-based Embedding , 2019, ICLR.

[7]  Michael I. Jordan,et al.  HopSkipJumpAttack: A Query-Efficient Decision-Based Attack , 2019, 2020 IEEE Symposium on Security and Privacy (SP).

[8]  Amit Daniely,et al.  Most ReLU Networks Suffer from 𝓁2 Adversarial Perturbations , 2020, ArXiv.

[9]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[10]  Cho-Jui Hsieh,et al.  Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.

[11]  Yiwen Guo,et al.  Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks , 2019, NeurIPS.

[12]  Andrew Gordon Wilson,et al.  Simple Black-box Adversarial Attacks , 2019, ICML.

[13]  David Rolnick,et al.  Complexity of Linear Regions in Deep Networks , 2019, ICML.

[14]  Elvis Dohmatob,et al.  Generalized No Free Lunch Theorem for Adversarial Robustness , 2018, ICML.

[15]  Hossein Mobahi,et al.  Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.

[16]  Kilian Q. Weinberger,et al.  Low Frequency Adversarial Perturbation , 2018, UAI.

[17]  Saeed Mahloujifar,et al.  The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure , 2018, AAAI.

[18]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[19]  Jinfeng Yi,et al.  AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[20]  Bernhard Schölkopf,et al.  First-Order Adversarial Vulnerability of Neural Networks and Input Dimension , 2018, ICML.

[21]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[22]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[23]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[24]  Christian Tjandraatmadja,et al.  Bounding and Counting Linear Regions of Deep Neural Networks , 2017, ICML.

[25]  Valentin Khrulkov,et al.  Art of Singular Vectors and Universal Adversarial Perturbations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[27]  Pascal Frossard,et al.  Analysis of universal adversarial perturbations , 2017, ArXiv.

[28]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jean-Noël Corvellec,et al.  Nonlinear Error Bounds via a Change of Function , 2017, J. Optim. Theory Appl..

[30]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[31]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[32]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[33]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[34]  A. Rahimi,et al.  Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[35]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[36]  E. Lieb,et al.  A general rearrangement inequality for multiple integrals , 1974 .