Classification and Adversarial examples in an Overparameterized Linear Model: A Signal Processing Perspective

State-of-the-art deep learning classifiers are heavily overparameterized with respect to the amount of training examples and observed to generalize well on “clean” data, but be highly susceptible to infinitesmal adversarial perturbations. In this paper, we identify an overparameterized linear ensemble, that uses the “lifted” Fourier feature map, that demonstrates both of these behaviors . The input is one-dimensional, and the adversary is only allowed to perturb these inputs and not the non-linear features directly. We find that the learned model is susceptible to adversaries in an intermediate regime where classification generalizes but regression does not. Notably, the susceptibility arises despite the absence of model misspecification or label noise, which are commonly cited reasons for adversarial-susceptibility. These results are extended theoretically to a random-Fourier-sum setup that exhibits double-descent behavior. In both feature-setups, the adversarial vulnerability arises because of a phenomenon we term spatial localization: the predictions of the learned model are markedly more sensitive in the vicinity of training points than elsewhere. This sensitivity is a consequence of feature lifting and is reminiscent of Gibb’s and Runge’s phenomena from signal processing and functional analysis. Despite the adversarial susceptibility, we find that classification with these features can be easier than the more commonly studied “independent feature” models.

[1]  David Mease,et al.  Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers , 2015, J. Mach. Learn. Res..

[2]  John Duchi,et al.  Understanding and Mitigating the Tradeoff Between Robustness and Accuracy , 2020, ICML.

[3]  Yuanzhi Li,et al.  A law of robustness for two-layers neural networks , 2020, ArXiv.

[4]  Philip M. Long,et al.  Failures of model-dependent generalization bounds for least-norm interpolation , 2020, Journal of machine learning research.

[5]  Partha P Mitra,et al.  Understanding overfitting peaks in generalization error: Analytical risk curves for l2 and l1 penalized interpolation , 2019, ArXiv.

[6]  Christos Thrampoulidis,et al.  Benign Overfitting in Binary Classification of Gaussian Mixtures , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Mikhail Belkin,et al.  Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..

[8]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[9]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[10]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[11]  Jianqing Fan,et al.  Asymptotics of empirical eigenstructure for high dimensional spiked covariance. , 2017, Annals of statistics.

[12]  Philip M. Long Properties of the After Kernel , 2021, ArXiv.

[13]  Andrea Montanari,et al.  Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.

[14]  Mikhail Belkin,et al.  To understand deep learning we need to understand kernel learning , 2018, ICML.

[15]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[16]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[17]  Ji Xu,et al.  On the proliferation of support vectors in high dimensions , 2020, ArXiv.

[18]  Philip M. Long,et al.  Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , 2020, ArXiv.

[19]  Kannan Ramchandran,et al.  Rademacher Complexity for Adversarially Robust Generalization , 2018, ICML.

[20]  Andrea Montanari,et al.  When do neural networks outperform kernel methods? , 2020, NeurIPS.

[21]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[22]  Philip M. Long,et al.  Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.

[23]  Saeed Mahloujifar,et al.  The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure , 2018, AAAI.

[24]  Mikhail Belkin,et al.  Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.

[25]  Alexander Rakhlin,et al.  Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon , 2018, COLT.

[26]  Andrea Montanari,et al.  The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.

[27]  Mikhail Belkin,et al.  Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..

[28]  Francis Bach,et al.  On Lazy Training in Differentiable Programming , 2018, NeurIPS.

[29]  Mikhail Belkin,et al.  Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures , 2021, NeurIPS.

[30]  Tom Goldstein,et al.  Are adversarial examples inevitable? , 2018, ICLR.

[31]  Preetum Nakkiran,et al.  Adversarial Robustness May Be at Odds With Simplicity , 2019, ArXiv.

[32]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[33]  Tengyuan Liang,et al.  Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits , 2019, Journal of the American Statistical Association.

[34]  Levent Sagun,et al.  The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.

[35]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[36]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[37]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[38]  Holger Rauhut,et al.  Weighted Optimization: better generalization by smoother interpolation , 2020, ArXiv.

[39]  Anant Sahai,et al.  Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[40]  Adel Javanmard,et al.  Precise Statistical Analysis of Classification Accuracies for Adversarial Training , 2020, ArXiv.

[41]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[42]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[43]  Mark Sellke,et al.  A Universal Law of Robustness via Isoperimetry , 2021, ArXiv.

[44]  Preetum Nakkiran,et al.  More Data Can Hurt for Linear Regression: Sample-wise Double Descent , 2019, ArXiv.

[45]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[46]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[47]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[48]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.