Performance of Bayesian linear regression in a model with mismatch

In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the highdimensional limit where the number of samples and the covariates’ dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previously studied for Bayesian-optimal linear regression where the correct posterior is used for inference, much less is known when there is a mismatch. Here we consider a model in which the responses are corrupted by gaussian noise and are known to be generated as linear combinations of the covariates, but the distributions of the groundtruth regression coefficients and of the noise are unknown. This regression task can be rephrased as a statistical mechanics model known as the Gardner spin glass, an analogy which we exploit. Using a leave-one-out approach we characterize the mean-square error for the regression coefficients. We also derive the log-normalizing constant of the posterior. Similar models have been studied by Shcherbina and Tirozzi and by Talagrand, but our arguments are much more straightforward. An interesting consequence of our analysis is that in the quadratic loss case, the performance of the Bayesian estimator is independent of a global “temperature” hyperparameter and matches the ridge estimator: sampling and optimizing are equally good.

[1]  D. Panchenko,et al.  Strong Replica Symmetry in High-Dimensional Optimal Bayesian Inference , 2020, Communications in Mathematical Physics.

[2]  A. Montanari,et al.  Fundamental barriers to high-dimensional regression with convex penalties , 2019, The Annals of Statistics.

[3]  Sumit Mukherjee,et al.  Variational Inference in high-dimensional linear regression , 2021, J. Mach. Learn. Res..

[4]  Ilias Zadik,et al.  It was "all" for "nothing": sharp phase transitions for noiseless discrete channels , 2021, COLT.

[5]  Christos Thrampoulidis,et al.  Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions , 2020, AISTATS.

[6]  S. Ghosal,et al.  Bayesian inference in high-dimensional models , 2021, 2101.04491.

[7]  Ilias Zadik,et al.  The All-or-Nothing Phenomenon in Sparse Tensor PCA , 2020, NeurIPS.

[8]  Jean Barbier,et al.  Information theoretic limits of learning a sparse rule , 2020, NeurIPS.

[9]  Jean Barbier,et al.  All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation , 2020, NeurIPS.

[10]  Yue M. Lu,et al.  Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization , 2020, NeurIPS.

[11]  Asymptotic Errors for Teacher-Student Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula) , 2020, ArXiv.

[12]  Florent Krzakala,et al.  Asymptotic errors for convex penalized linear regression beyond Gaussian matrices. , 2020, 2002.04372.

[13]  Yoshiyuki Kabashima,et al.  Macroscopic Analysis of Vector Approximate Message Passing in a Model Mismatch Setting , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[14]  Nicolas Macris,et al.  Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation , 2017, IEEE Transactions on Information Theory.

[15]  Galen Reeves,et al.  All-or-Nothing Phenomena: From Single-Letter to High Dimensions , 2019, 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[16]  Kolyan Ray,et al.  Variational Bayes for High-Dimensional Linear Regression With Sparse Priors , 2019, Journal of the American Statistical Association.

[17]  Galen Reeves,et al.  The All-or-Nothing Phenomenon in Sparse Linear Regression , 2019, COLT.

[18]  E. Candès,et al.  A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.

[19]  Nicolas Macris,et al.  Optimal errors and phase transitions in high-dimensional generalized linear models , 2017, Proceedings of the National Academy of Sciences.

[20]  Noureddine El Karoui,et al.  On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .

[21]  Christos Thrampoulidis,et al.  Precise Error Analysis of Regularized $M$ -Estimators in High Dimensions , 2016, IEEE Transactions on Information Theory.

[22]  David Gamarnik,et al.  High Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transtition , 2017, COLT.

[23]  Stephen G. Walker,et al.  Empirical Bayes posterior concentration in sparse high-dimensional linear models , 2014, 1406.7718.

[24]  Surya Ganguli,et al.  An equivalence between high dimensional Bayes optimal inference and M-estimation , 2016, NIPS.

[25]  Surya Ganguli,et al.  Statistical Mechanics of Optimal Convex Inference in High Dimensions , 2016 .

[26]  Galen Reeves,et al.  The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[27]  Nicolas Macris,et al.  The mutual information in random linear estimation , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[28]  Jelena Bradic,et al.  Robustness in sparse high-dimensional linear models: Relative efficiency and robust approximate message passing , 2016 .

[29]  Christos Thrampoulidis,et al.  Regularized Linear Regression: A Precise Analysis of the Estimation Error , 2015, COLT.

[30]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[31]  Karim Lounici,et al.  Estimation and variable selection with exponential weights , 2014 .

[32]  Andrea Montanari,et al.  High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.

[33]  P. Bickel,et al.  Optimal M-estimation in high-dimensional regression , 2013, Proceedings of the National Academy of Sciences.

[34]  P. Bickel,et al.  On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.

[35]  D. Panchenko The Sherrington-Kirkpatrick Model , 2013 .

[36]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[37]  Tim Austin Mean field models for spin glasses , 2012 .

[38]  M. Talagrand Mean Field Models for Spin Glasses , 2011 .

[39]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[40]  Sylvia Richardson,et al.  Evolutionary Stochastic Search for Bayesian model exploration , 2010, 1002.2706.

[41]  Felix Abramovich,et al.  MAP model selection in Gaussian regression , 2009, 0912.4387.

[42]  H. Touchette The large deviation approach to statistical mechanics , 2008, 0804.0327.

[43]  H. Nishimori,et al.  Spin Glass Identities and the Nishimori Line , 2008, 0805.0754.

[44]  M. Yuan,et al.  Efficient Empirical Bayes Variable Selection and Estimation in Linear Models , 2005 .

[45]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[46]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[47]  M. Shcherbina,et al.  Rigorous Solution of the Gardner Problem , 2001, math-ph/0112003.

[48]  V. Akila,et al.  Information , 2001, The Lancet.

[49]  西森 秀稔 Statistical physics of spin glasses and information processing : an introduction , 2001 .

[50]  Dean Phillips Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[51]  E. George The Variable Selection Problem , 2000 .

[52]  S. Kak Information, physics, and computation , 1996 .

[53]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[54]  E. Gardner The space of interactions in neural network models , 1988 .

[55]  E. Lieb,et al.  On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation , 1976 .