Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization
暂无分享,去创建一个
[1] O. Shamir,et al. Implicit Regularization Towards Rank Minimization in ReLU Networks , 2022, ALT.
[2] P. Bartlett,et al. Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data , 2022, ICLR.
[3] S. Ganguli,et al. The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks , 2022, ICLR.
[4] Gal Vardi. On the Implicit Bias in Deep-Learning Algorithms , 2022, Commun. ACM.
[5] James B. Simon,et al. Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting , 2022, ArXiv.
[6] Jason D. Lee,et al. On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias , 2022, NeurIPS.
[7] Fanny Yang,et al. Fast rates for noisy interpolation require rethinking the effects of inductive bias , 2022, ICML.
[8] Mikhail Belkin,et al. Benign Overfitting in Two-layer Convolutional Neural Networks , 2022, NeurIPS.
[9] Niladri S. Chatterji,et al. Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data , 2022, COLT.
[10] O. Shamir,et al. Gradient Methods Provably Converge to Non-Robust Networks , 2022, NeurIPS.
[11] O. Shamir. The Implicit Bias of Benign Overfitting , 2022, COLT.
[12] Fanny Yang,et al. Tight bounds for minimum l1-norm interpolation of noisy data , 2021, ArXiv.
[13] Andrew D. McRae,et al. Harmless interpolation in regression and classification with structured features , 2021, International Conference on Artificial Intelligence and Statistics.
[14] Christos Thrampoulidis,et al. Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization , 2020, SIAM Journal on Mathematics of Data Science.
[15] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[16] Sanjeev Arora,et al. Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias , 2021, NeurIPS.
[17] Nathan Srebro,et al. On Margin Maximization in Linear and ReLU Networks , 2021, NeurIPS.
[18] Philip M. Long,et al. The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks , 2021, J. Mach. Learn. Res..
[19] Matus Telgarsky,et al. Fast Margin Maximization via Dual Acceleration , 2021, ICML.
[20] Christos Thrampoulidis,et al. Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation , 2021, NeurIPS.
[21] Nathan Srebro,et al. Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds, and Benign Overfitting , 2021, NeurIPS.
[22] Mikhail Belkin,et al. Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures , 2021, NeurIPS.
[23] Benjamin Recht,et al. Interpolating Classifiers Make Few Mistakes , 2021, J. Mach. Learn. Res..
[24] Amir Globerson,et al. Towards Understanding Learning in Neural Networks with Linear Teachers , 2021, ICML.
[25] Quanquan Gu,et al. Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise , 2021, ICML.
[26] Philip M. Long,et al. Failures of model-dependent generalization bounds for least-norm interpolation , 2020, Journal of machine learning research.
[27] Quanquan Gu,et al. Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins , 2020, ICML.
[28] Daniel J. Hsu,et al. On the proliferation of support vectors in high dimensions , 2020, AISTATS.
[29] Ohad Shamir,et al. Gradient Methods Never Overfit On Separable Data , 2020, J. Mach. Learn. Res..
[30] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..
[31] Philip M. Long,et al. Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , 2020, J. Mach. Learn. Res..
[32] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[33] Matus Telgarsky,et al. Characterizing the implicit bias via a primal-dual analysis , 2019, ALT.
[34] Mary Phuong,et al. The inductive bias of ReLU networks on orthogonally separable data , 2021, ICLR.
[35] Christos Thrampoulidis,et al. Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View , 2020, NeurIPS.
[36] P. Bartlett,et al. Benign overfitting in ridge regression , 2020, J. Mach. Learn. Res..
[37] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[38] Matus Telgarsky,et al. Gradient descent follows the regularization path for general losses , 2020, COLT.
[39] Matus Telgarsky,et al. Directional convergence and alignment in deep learning , 2020, NeurIPS.
[40] Ji Xu,et al. On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression , 2020, NeurIPS.
[41] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[42] Daniel M. Roy,et al. In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors , 2019, ICML.
[43] Tengyuan Liang,et al. On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels , 2019, COLT.
[44] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[45] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[46] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[47] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[48] A. Montanari,et al. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .
[49] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[50] Alexander Rakhlin,et al. Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon , 2018, COLT.
[51] Nathan Srebro,et al. Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate , 2018, AISTATS.
[52] Nathan Srebro,et al. Convergence of Gradient Descent on Separable Data , 2018, AISTATS.
[53] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[54] Matus Telgarsky,et al. Risk and parameter convergence of logistic regression , 2018, ArXiv.
[55] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[56] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[57] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[58] Kalyanmoy Deb,et al. Approximate KKT points and a proximity measure for termination , 2013, J. Glob. Optim..
[59] Santosh S. Vempala,et al. The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.
[60] Yu. S. Ledyaev,et al. Nonsmooth analysis and control theory , 1998 .