Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation
暂无分享,去创建一个
[1] Jianfeng Lu,et al. Neural collapse under cross-entropy loss , 2022, Applied and Computational Harmonic Analysis.
[2] Marc Niethammer,et al. Dissecting Supervised Constrastive Learning , 2021, ICML.
[3] Christos Thrampoulidis,et al. Phase Transitions for One-Vs-One and One-Vs-All Linear Separability in Multiclass Gaussian Mixtures , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] X. Y. Han,et al. Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path , 2021, ICLR.
[5] Clayton Sanford,et al. Support vector machines and linear regression coincide with very high-dimensional features , 2021, NeurIPS.
[6] Zhihui Zhu,et al. A Geometric Analysis of Neural Collapse with Unconstrained Features , 2021, NeurIPS.
[7] Mikhail Belkin,et al. Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures , 2021, NeurIPS.
[8] Christos Thrampoulidis,et al. Label-Imbalanced and Group-Sensitive Classification under Overparameterization , 2021, NeurIPS.
[9] Hangfeng He,et al. Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training , 2021, Proceedings of the National Academy of Sciences.
[10] Benjamin Recht,et al. Interpolating Classifiers Make Few Mistakes , 2021, J. Mach. Learn. Res..
[11] Qianli Liao,et al. Explicit regularization and implicit bias in deep network classifiers trained with the square loss , 2020, ArXiv.
[12] Dustin G. Mixon,et al. Neural collapse with unconstrained features , 2020, Sampling Theory, Signal Processing, and Data Analysis.
[13] Christos Thrampoulidis,et al. Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization , 2020, SIAM Journal on Mathematics of Data Science.
[14] Christos Thrampoulidis,et al. Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View , 2020, NeurIPS.
[15] Philip M. Long,et al. Failures of model-dependent generalization bounds for least-norm interpolation , 2020, Journal of machine learning research.
[16] P. Bartlett,et al. Benign overfitting in ridge regression , 2020, J. Mach. Learn. Res..
[17] Daniel J. Hsu,et al. On the proliferation of support vectors in high dimensions , 2020, AISTATS.
[18] Yue M. Lu,et al. A Precise Performance Analysis of Learning with Random Features , 2020, ArXiv.
[19] David L. Donoho,et al. Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.
[20] Ankit Singh Rawat,et al. Long-tail learning via logit adjustment , 2020, ICLR.
[21] Babak Hassibi,et al. The Performance Analysis of Generalized Margin Maximizer (GMM) on Separable Data , 2020, ICML.
[22] Christos Thrampoulidis,et al. Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions , 2020, AISTATS.
[23] Mikhail Belkin,et al. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks , 2020, ICLR.
[24] Yue M. Lu,et al. Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization , 2020, NeurIPS.
[25] Mikhail Belkin,et al. Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..
[26] Philip M. Long,et al. Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , 2020, J. Mach. Learn. Res..
[27] Ce Liu,et al. Supervised Contrastive Learning , 2020, NeurIPS.
[28] G. A. Young,et al. High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.
[29] Panagiotis Lolas,et al. Regularization in High-Dimensional Regression and Classification via Random Matrix Theory , 2020, 2003.13723.
[30] Mohamed-Slim Alouini,et al. On the Precise Error Analysis of Support Vector Machines , 2020, IEEE Open Journal of Signal Processing.
[31] Samet Oymak,et al. Exploring the Role of Loss Functions in Multiclass Classification , 2020, 2020 54th Annual Conference on Information Sciences and Systems (CISS).
[32] Christos Thrampoulidis,et al. Sharp Asymptotics and Optimal Performance for Inference in Binary Models , 2020, AISTATS.
[33] Tengyuan Liang,et al. A Precise High-Dimensional Asymptotic Theory for Boosting and Min-L1-Norm Interpolated Classifiers , 2020, SSRN Electronic Journal.
[34] Christos Thrampoulidis,et al. A Model of Double Descent for High-dimensional Binary Linear Classification , 2019, Information and Inference: A Journal of the IMA.
[35] A. Montanari,et al. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .
[36] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[37] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[38] Colin Wei,et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.
[39] Babak Hassibi,et al. The Impact of Regularization on High-dimensional Logistic Regression , 2019, NeurIPS.
[40] Zhenyu Liao,et al. A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[42] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[43] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[44] Andries Petrus Engelbrecht,et al. Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions , 2019, Neurocomputing.
[45] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[46] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[47] P. S. Sastry,et al. Robust Loss Functions for Learning Multi-class Classifiers , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
[48] D. Kobak,et al. Optimal ridge penalty for real-world high-dimensional data can be zero or negative due to the implicit ridge regularization , 2018, 1805.10939.
[49] E. Candès,et al. A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.
[50] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[51] Krzysztof Gajowniczek,et al. Generalized Entropy Cost Function in Neural Networks , 2017, ICANN.
[52] Marius Kloft,et al. Data-Dependent Generalization Bounds for Multi-Class Classification , 2017, IEEE Transactions on Information Theory.
[53] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[54] Dimitris Samaras,et al. Squared Earth Mover's Distance-based Loss for Training Deep Neural Networks , 2016, ArXiv.
[55] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[56] Csaba Szepesvári,et al. Multiclass Classification Calibration Functions , 2016, ArXiv.
[57] Mehryar Mohri,et al. Structured Prediction Theory Based on Factor Graph Complexity , 2016, NIPS.
[58] Andreas Maurer,et al. A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.
[59] Alexander Binder,et al. Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms , 2015, NIPS.
[60] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[61] Csaba Szepesvári,et al. Cost-sensitive Multiclass Classification Risk Bounds , 2013, ICML.
[62] M. Rudelson,et al. Hanson-Wright inequality and sub-gaussian concentration , 2013 .
[63] Nuno Vasconcelos,et al. Cost-Sensitive Support Vector Machines , 2012, Neurocomputing.
[64] François Laviolette,et al. A PAC-Bayes Sample-compression Approach to Kernel Methods , 2011, ICML.
[65] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .
[66] D. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas , 2009 .
[67] Ambuj Tewari,et al. On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..
[68] Dörthe Malzahn,et al. A statistical physics approach for the analysis of machine learning algorithms on real data , 2005 .
[69] John Shawe-Taylor,et al. PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification , 2005, Machine Learning.
[70] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..
[71] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[72] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .
[73] Johannes Fürnkranz,et al. Round Robin Classification , 2002, J. Mach. Learn. Res..
[74] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[75] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.
[76] Mirta B. Gordon,et al. Robust learning and generalization with support vector machines , 2001 .
[77] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..
[78] M. Opper,et al. Statistical mechanics of Support Vector networks. , 1998, cond-mat/9811421.
[79] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.
[80] Thomas G. Dietterich,et al. Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..
[81] Hanwen Huang,et al. Asymptotic behavior of Support Vector Machine for spiked population model , 2017, J. Mach. Learn. Res..
[82] Lea Fleischer,et al. Regularization of Inverse Problems , 1996 .
[83] G. Wahba,et al. Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .
[84] Tomaso Poggio,et al. Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .
[85] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[86] Kristin P. Bennett,et al. Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..
[87] Jason Weston,et al. Multi-Class Support Vector Machines , 1998 .
[88] O. Mangasarian,et al. Multicategory discrimination via linear programming , 1994 .