High Dimensional Classification via Regularized and Unregularized Empirical Risk Minimization: Precise Error and Optimal Loss.

This article provides, through theoretical analysis, an in-depth understanding of the classification performance of the empirical risk minimization framework, in both ridge-regularized and unregularized cases, when high dimensional data are considered. Focusing on the fundamental problem of separating a two-class Gaussian mixture, the proposed analysis allows for a precise prediction of the classification error for a set of numerous data vectors $\mathbf{x} \in \mathbb R^p$ of sufficiently large dimension $p$. This precise error depends on the loss function, the number of training samples, and the statistics of the mixture data model. It is shown to hold beyond Gaussian distribution under some additional non-sparsity condition of the data statistics. Building upon this quantitative error analysis, we identify the simple square loss as the optimal choice for high dimensional classification in both ridge-regularized and unregularized cases, regardless of the number of training samples.

[1]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[2]  Zhenyu Liao,et al.  A Large Dimensional Analysis of Least Squares Support Vector Machines , 2017, IEEE Transactions on Signal Processing.

[3]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[4]  Donald D. Lucas,et al.  Designing optimal greenhouse gas observing networks that consider performance and cost , 2014 .

[5]  Andrea Montanari,et al.  Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.

[6]  Florent Krzakala,et al.  The role of regularization in classification of high-dimensional noisy Gaussian mixture , 2020, ICML.

[7]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[8]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[9]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[10]  Christos Thrampoulidis,et al.  Precise Error Analysis of Regularized $M$ -Estimators in High Dimensions , 2016, IEEE Transactions on Information Theory.

[11]  Christos Thrampoulidis,et al.  Optimality of Least-squares for Classification in Gaussian-Mixture Models , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[12]  Terence Tao,et al.  Random matrices: Universality of ESDs and the circular law , 2008, 0807.4898.

[13]  Stefan Wager,et al.  High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.

[14]  Christos Thrampoulidis,et al.  Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions , 2020, ArXiv.

[15]  Andrea Montanari,et al.  The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.

[16]  Brian Johnson,et al.  Classifying a high resolution image of an urban area using super-object information , 2013 .

[17]  Mohamed-Slim Alouini,et al.  A Large Dimensional Study of Regularized Discriminant Analysis , 2017, IEEE Transactions on Signal Processing.

[18]  H. A. Guvenir,et al.  A supervised machine learning algorithm for arrhythmia analysis , 1997, Computers in Cardiology 1997.

[19]  P. Bickel,et al.  On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[20]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[21]  Babak Hassibi,et al.  The Impact of Regularization on High-dimensional Logistic Regression , 2019, NeurIPS.

[22]  E. Candès,et al.  A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.

[23]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[24]  Andrea Montanari,et al.  High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.

[25]  Zhenyu Liao,et al.  The Dynamics of Learning: A Random Matrix Approach , 2018, ICML.

[26]  Y. Gordon Some inequalities for Gaussian processes and applications , 1985 .

[27]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[28]  Z. D. Bai,et al.  Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , 1993 .

[29]  Christos Thrampoulidis,et al.  Sharp Asymptotics and Optimal Performance for Inference in Binary Models , 2020, AISTATS.

[30]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[31]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[32]  E. Candès,et al.  The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression , 2018, The Annals of Statistics.

[33]  Romain Couillet,et al.  A random matrix analysis and improvement of semi-supervised learning for large dimensional data , 2017, J. Mach. Learn. Res..

[34]  Max A. Little,et al.  Objective Automatic Assessment of Rehabilitative Speech Treatment in Parkinson's Disease , 2014, IEEE Transactions on Neural Systems and Rehabilitation Engineering.