Sharp Asymptotics and Optimal Performance for Inference in Binary Models

We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance among them. Notably, we show that the proposed bound is tight for popular binary models (such as Signed, Logistic or Probit), by constructing appropriate loss functions that achieve it. More interestingly, for binary linear classification under the Logistic and Probit models, we prove that the performance of least-squares is no worse than 0.997 and 0.98 times the optimal one. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

[1]  Yaniv Plan,et al.  One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.

[2]  D. Brillinger A Generalized Linear Model With “Gaussian” Regressor Variables , 2012 .

[3]  Nelson M. Blachman,et al.  The convolution inequality for entropy powers , 1965, IEEE Trans. Inf. Theory.

[4]  Andrea Montanari,et al.  The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning , 2018, The Annals of Statistics.

[5]  Mihailo Stojnic,et al.  A framework to characterize performance of LASSO algorithms , 2013, ArXiv.

[6]  A. Montanari,et al.  The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .

[7]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[8]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[9]  Arian Maleki,et al.  Overcoming The Limitations of Phase Transition by Higher Order Analysis of Regularization Techniques , 2016, The Annals of Statistics.

[10]  Richard G. Baraniuk,et al.  Consistent Parameter Estimation for LASSO and Approximate Message Passing , 2015, The Annals of Statistics.

[11]  Yaniv Plan,et al.  The Generalized Lasso With Non-Linear Observations , 2015, IEEE Transactions on Information Theory.

[12]  Laurent Jacques,et al.  Quantized Compressive Sensing with RIP Matrices: The Benefit of Dithering , 2018, Information and Inference: A Journal of the IMA.

[13]  Arian Maleki,et al.  Consistent Risk Estimation in High-Dimensional Linear Regression , 2019, ArXiv.

[14]  Christos Thrampoulidis,et al.  The squared-error of generalized LASSO: A precise analysis , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Martin Genzel,et al.  High-Dimensional Estimation of Structured Signals From Non-Linear Observations With General Convex Loss Functions , 2016, IEEE Transactions on Information Theory.

[16]  Babak Hassibi,et al.  The Impact of Regularization on High-dimensional Logistic Regression , 2019, NeurIPS.

[17]  Weijie Su,et al.  Algorithmic Analysis and Statistical Estimation of SLOPE via Approximate Message Passing , 2019, IEEE Transactions on Information Theory.

[18]  Noureddine El Karoui,et al.  On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[19]  M. Stojnic Various thresholds for $\ell_1$-optimization in compressed sensing , 2009 .

[20]  A. Montanari,et al.  Fundamental barriers to high-dimensional regression with convex penalties , 2019, The Annals of Statistics.

[21]  E. Candès,et al.  A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.

[22]  Yaniv Plan,et al.  Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.

[23]  Christos Thrampoulidis,et al.  Sharp Guarantees for Solving Random Equations with One-Bit Information , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Noureddine El Karoui On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[25]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[26]  E. Candès,et al.  The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression , 2018, The Annals of Statistics.

[27]  Richard G. Baraniuk,et al.  1-Bit compressive sensing , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[28]  Zhenyu Liao,et al.  A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Christos Thrampoulidis,et al.  LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements , 2015, NIPS.

[30]  Joel A. Tropp,et al.  Convex recovery of a structured signal from independent random linear measurements , 2014, ArXiv.

[31]  Christos Thrampoulidis,et al.  The Generalized Lasso for Sub-Gaussian Measurements With Dithered Quantization , 2018, IEEE Transactions on Information Theory.

[32]  Christos Thrampoulidis,et al.  Precise Error Analysis of Regularized $M$ -Estimators in High Dimensions , 2016, IEEE Transactions on Information Theory.

[33]  Christos Thrampoulidis,et al.  Optimality of Least-squares for Classification in Gaussian-Mixture Models , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[34]  Max H. M. Costa,et al.  A new entropy power inequality , 1985, IEEE Trans. Inf. Theory.

[35]  Christos Thrampoulidis,et al.  Regularized Linear Regression: A Precise Analysis of the Estimation Error , 2015, COLT.

[36]  Peter Jung,et al.  Recovering Structured Data From Superimposed Non-Linear Measurements , 2017, IEEE Transactions on Information Theory.

[37]  Noureddine El Karoui,et al.  Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results , 2013, 1311.2445.

[38]  Christos Thrampoulidis,et al.  Phase Retrieval via Polytope Optimization: Geometry, Phase Transitions, and New Algorithms , 2018, ArXiv.

[39]  Christos Thrampoulidis,et al.  A Model of Double Descent for High-dimensional Binary Linear Classification , 2019, ArXiv.

[40]  Surya Ganguli,et al.  Statistical Mechanics of Optimal Convex Inference in High Dimensions , 2016 .

[41]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[42]  Andrea Montanari,et al.  The LASSO Risk for Gaussian Matrices , 2010, IEEE Transactions on Information Theory.

[43]  Y. Gordon On Milman's inequality and random subspaces which escape through a mesh in ℝ n , 1988 .

[44]  Xiaohan Wei,et al.  Structured Signal Recovery From Non-Linear and Heavy-Tailed Measurements , 2016, IEEE Transactions on Information Theory.

[45]  Christos Thrampoulidis,et al.  Symbol Error Rate Performance of Box-Relaxation Decoders in Massive MIMO , 2018, IEEE Transactions on Signal Processing.

[46]  Mihailo Stojnic,et al.  Various thresholds for ℓ1-optimization in compressed sensing , 2009, ArXiv.

[47]  Andrea Montanari,et al.  The Noise-Sensitivity Phase Transition in Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[48]  P. Bickel,et al.  Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[49]  Laurent Jacques,et al.  Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors , 2011, IEEE Transactions on Information Theory.

[50]  Joel A. Tropp,et al.  Universality laws for randomized dimension reduction, with applications , 2015, ArXiv.

[51]  Andrea Montanari,et al.  High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.