The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are highly vulnerable to adversarial examples. Our results hold even in cases where the network has many more parameters than training examples. Despite the potential for harmful overfitting in such overparameterized settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial $\ell_2$-perturbations), even though robust networks that fit the data exist.

[1]  P. Bartlett,et al.  Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data , 2022, ICLR.

[2]  S. Ganguli,et al.  The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks , 2022, ICLR.

[3]  Gal Vardi On the Implicit Bias in Deep-Learning Algorithms , 2022, Commun. ACM.

[4]  R. Arora,et al.  Adversarial Robustness is at Odds with Lazy Training , 2022, NeurIPS.

[5]  O. Shamir,et al.  Reconstructing Training Data from Trained Neural Networks , 2022, NeurIPS.

[6]  Jason D. Lee,et al.  On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias , 2022, NeurIPS.

[7]  A. Montanari,et al.  Adversarial Examples in Random Neural Networks with General Activations , 2022, ArXiv.

[8]  A. Bietti,et al.  On the (Non-)Robustness of Two-Layer Neural Networks in Different Learning Regimes , 2022, ArXiv.

[9]  O. Shamir,et al.  Gradient Methods Provably Converge to Non-Robust Networks , 2022, NeurIPS.

[10]  Yuanzhi Li,et al.  Feature Purification: How Adversarial Training Performs Robust Deep Learning , 2020, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).

[11]  Sanjeev Arora,et al.  Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias , 2021, NeurIPS.

[12]  Nathan Srebro,et al.  On Margin Maximization in Linear and ReLU Networks , 2021, NeurIPS.

[13]  Peter L. Bartlett,et al.  Adversarial Examples in Multi-Layer Random ReLU Networks , 2021, NeurIPS.

[14]  A. Shamir,et al.  The Dimpled Manifold Model of Adversarial Examples in Machine Learning , 2021, ArXiv.

[15]  Sébastien Bubeck,et al.  A Universal Law of Robustness via Isoperimetry , 2021, NeurIPS.

[16]  Sébastien Bubeck,et al.  A single gradient step finds adversarial examples on random two-layers neural networks , 2021, NeurIPS.

[17]  R. Basri,et al.  Shift Invariance Can Reduce Adversarial Robustness , 2021, NeurIPS.

[18]  In So Kweon,et al.  A Survey On Universal Adversarial Attack , 2021, IJCAI.

[19]  Amir Globerson,et al.  Towards Understanding Learning in Neural Networks with Linear Teachers , 2021, ICML.

[20]  Sébastien Bubeck,et al.  A law of robustness for two-layers neural networks , 2020, COLT.

[21]  Mary Phuong,et al.  The inductive bias of ReLU networks on orthogonally separable data , 2021, ICLR.

[22]  Jeffrey Pennington,et al.  The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks , 2020, NeurIPS.

[23]  Matus Telgarsky,et al.  Directional convergence and alignment in deep learning , 2020, NeurIPS.

[24]  Prateek Jain,et al.  The Pitfalls of Simplicity Bias in Neural Networks , 2020, NeurIPS.

[25]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[26]  Francis Bach,et al.  Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.

[27]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[28]  L. Davis,et al.  Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors , 2019, ECCV.

[29]  Kaifeng Lyu,et al.  Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.

[30]  Eric P. Xing,et al.  High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Adi Shamir,et al.  A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance , 2019, ArXiv.

[32]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[33]  Ilya P. Razenshteyn,et al.  Adversarial examples from computational constraints , 2018, ICML.

[34]  Dylan Hadfield-Menell,et al.  On the Geometry of Adversarial Examples , 2018, ArXiv.

[35]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[36]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[37]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[38]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[39]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[40]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[41]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[42]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[43]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[44]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[45]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[46]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[48]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[49]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[50]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[51]  Kalyanmoy Deb,et al.  Approximate KKT points and a proximity measure for termination , 2013, J. Glob. Optim..

[52]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[53]  Yu. S. Ledyaev,et al.  Nonsmooth analysis and control theory , 1998 .