Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization
暂无分享,去创建一个
[1] Clayton Sanford,et al. Support vector machines and linear regression coincide with very high-dimensional features , 2021, NeurIPS.
[2] Mikhail Belkin,et al. Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures , 2021, NeurIPS.
[3] Vladimir Braverman,et al. Benign Overfitting of Constant-Stepsize SGD for Linear Regression , 2021, COLT.
[4] Nicolas Flammarion,et al. Last iterate convergence of SGD for Least-Squares in the Interpolation regime , 2021, NeurIPS.
[5] Christos Thrampoulidis,et al. Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks , 2020, AAAI.
[6] Philip M. Long,et al. When does gradient descent with logistic loss find interpolating two-layer networks? , 2020, J. Mach. Learn. Res..
[7] Christos Thrampoulidis,et al. Benign Overfitting in Binary Classification of Gaussian Mixtures , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] P. Bartlett,et al. Benign overfitting in ridge regression , 2020, J. Mach. Learn. Res..
[9] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[10] Daniel J. Hsu,et al. On the proliferation of support vectors in high dimensions , 2020, AISTATS.
[11] Babak Hassibi,et al. The Performance Analysis of Generalized Margin Maximizer (GMM) on Separable Data , 2020, ICML.
[12] Michael W. Mahoney,et al. A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent , 2020, NeurIPS.
[13] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..
[14] Murat A. Erdogdu,et al. Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint , 2020, ICLR.
[15] Philip M. Long,et al. Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , 2020, J. Mach. Learn. Res..
[16] Jesse H. Krijthe,et al. A brief prehistory of double descent , 2020, Proceedings of the National Academy of Sciences.
[17] G. A. Young,et al. High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.
[18] Mohamed-Slim Alouini,et al. On the Precise Error Analysis of Support Vector Machines , 2020, IEEE Open Journal of Signal Processing.
[19] Chong You,et al. Rethinking Bias-Variance Trade-off for Generalization of Neural Networks , 2020, ICML.
[20] Florent Krzakala,et al. The role of regularization in classification of high-dimensional noisy Gaussian mixture , 2020, ICML.
[21] Christos Thrampoulidis,et al. Analytic Study of Double Descent in Binary Classification: The Impact of Loss , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).
[22] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[23] Christos Thrampoulidis,et al. A Model of Double Descent for High-dimensional Binary Linear Classification , 2019, Information and Inference: A Journal of the IMA.
[24] A. Montanari,et al. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .
[25] Tengyuan Liang,et al. On the Risk of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels , 2019, ArXiv.
[26] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[27] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[28] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[29] Babak Hassibi,et al. The Impact of Regularization on High-dimensional Logistic Regression , 2019, NeurIPS.
[30] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[31] T. Hastie,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[32] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[33] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[34] Daniel J. Hsu,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[35] D. Kobak,et al. Optimal ridge penalty for real-world high-dimensional data can be zero or negative due to the implicit ridge regularization , 2018, 1805.10939.
[36] E. Candès,et al. A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.
[37] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[38] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[39] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[40] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.
[41] Christos Thrampoulidis,et al. Precise Error Analysis of Regularized $M$ -Estimators in High Dimensions , 2016, IEEE Transactions on Information Theory.
[42] Christos Thrampoulidis,et al. Regularized Linear Regression: A Precise Analysis of the Estimation Error , 2015, COLT.
[43] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[44] Christos Thrampoulidis,et al. The squared-error of generalized LASSO: A precise analysis , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[45] M. Rudelson,et al. Hanson-Wright inequality and sub-gaussian concentration , 2013, 1306.2872.
[46] Mihailo Stojnic,et al. A framework to characterize performance of LASSO algorithms , 2013, ArXiv.
[47] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[48] Jiashun Jin. Impossibility of successful classification when useful features are rare and weak , 2009, Proceedings of the National Academy of Sciences.
[49] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .
[50] R. Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[51] Ji Zhu,et al. Margin Maximizing Loss Functions , 2003, NIPS.
[52] J. Stranlund,et al. Economic inequality and burden-sharing in the provision of local environmental quality , 2002 .
[53] Robert P. W. Duin,et al. Classifiers in almost empty spaces , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.
[54] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[55] M. Opper,et al. On the ability of the optimal perceptron to generalise , 1990 .
[56] F. Vallet,et al. Linear and Nonlinear Extension of the Pseudo-Inverse Solution for Learning Boolean Functions , 1989 .
[57] Y. Gordon. Some inequalities for Gaussian processes and applications , 1985 .
[58] S. Shalev-Shwartz,et al. Understanding Machine Learning - From Theory to Algorithms , 2014 .