Variational Inference in high-dimensional linear regression

We study high-dimensional Bayesian linear regression with product priors. Using the nascent theory of non-linear large deviations [CD16], we derive sufficient conditions for the leading-order correctness of the naive mean-field approximation to the log-normalizing constant of the posterior distribution. Subsequently, assuming a true linear model for the observed data, we derive a limiting infinite dimensional variational formula for the log normalizing constant of the posterior. Furthermore, we establish that under an additional “separation" condition, the variational problem has a unique optimizer, and this optimizer governs the probabilistic properties of the posterior distribution. We provide intuitive sufficient conditions for the validity of this “separation" condition. Finally, we illustrate our results on concrete examples with specific design matrices.

[1]  Christian Borgs,et al.  An $L^{p}$ theory of sparse graph convergence II: LD convergence, quotients and right convergence , 2014, 1408.0744.

[2]  David M. Blei,et al.  Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.

[3]  Samuel Kotz,et al.  Continuous univariate distributions : distributions in statistics , 1970 .

[4]  M. Wand,et al.  Mean field variational Bayes for continuous sparse signal shrinkage: Pitfalls and remedies , 2014 .

[5]  Tung H. Pham,et al.  ASYMPTOTIC NORMALITY AND VALID INFERENCE FOR GAUSSIAN VARIATIONAL APPROXIMATION , 2011, 1202.5183.

[6]  Kolyan Ray,et al.  Variational Bayes for High-Dimensional Linear Regression With Sparse Priors , 2019, Journal of the American Statistical Association.

[7]  F. Augeri A transportation approach to the mean-field approximation , 2019, 1903.08021.

[8]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[9]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[10]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[11]  B. Mallick,et al.  Continuous shrinkage prior revisited: a collapsing behavior and remedy , 2020, 2007.02192.

[12]  Sumit Mukherjee,et al.  Universality of the mean-field for the Potts model , 2015, 1508.03949.

[13]  Pierre Alquier,et al.  Consistency of variational Bayes inference for estimation and model selection in mixtures , 2018, 1805.05054.

[14]  Jun Yan Nonlinear large deviations: Beyond the hypercube , 2017, The Annals of Applied Probability.

[15]  Tyler H. McCormick,et al.  Beyond Prediction: A Framework for Inference With Variational Approximations in Mixture Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[16]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[17]  F. Liang,et al.  Nearly optimal Bayesian shrinkage for high-dimensional regression , 2017, Science China Mathematics.

[18]  Arun K. Kuchibhotla,et al.  Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression , 2018, 1804.02605.

[19]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[20]  A. Montanari,et al.  TAP free energy, spin glasses and variational inference , 2018, The Annals of Probability.

[21]  Andrej Risteski,et al.  Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective , 2018, STOC.

[22]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .

[23]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[24]  Amir Dembo,et al.  Nonlinear large deviations , 2014, 1401.3495.

[25]  David J. C. MacKay,et al.  Good Error-Correcting Codes Based on Very Sparse Matrices , 1997, IEEE Trans. Inf. Theory.

[26]  Subhashis Ghosal,et al.  Asymptotic normality of posterior distributions in high-dimensional linear models , 1999 .

[27]  D. Titterington,et al.  Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model , 2006 .

[28]  J. Ormerod,et al.  A variational Bayes approach to variable selection , 2017 .

[29]  Tim Austin,et al.  The structure of low-complexity Gibbs measures on product spaces , 2018, The Annals of Probability.

[30]  M. Mézard,et al.  Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .

[31]  Yixin Wang,et al.  Variational Bayes under Model Misspecification , 2019, NeurIPS.

[32]  C. Newman,et al.  The GHS and other correlation inequalities for a class of even ferromagnets , 1976 .

[33]  Wei Han,et al.  Statistical Inference in Mean-Field Variational Bayes , 2019, 1911.01525.

[34]  Pierre Alquier,et al.  Concentration of tempered posteriors and of their variational approximations , 2017, The Annals of Statistics.

[35]  Alan M. Frieze,et al.  Quick Approximation to Matrices and Applications , 1999, Comb..

[36]  V. Sós,et al.  Convergent Sequences of Dense Graphs I: Subgraph Frequencies, Metric Properties and Testing , 2007, math/0702004.

[37]  Renhua Wu,et al.  A large-scale screen for coding variants predisposing to psoriasis , 2013, Nature Genetics.

[38]  C. Borgs,et al.  Consistent nonparametric estimation for heavy-tailed sparse graphs , 2015, The Annals of Statistics.

[39]  B. Mallick,et al.  Fast sampling with Gaussian scale-mixture priors in high-dimensional regression. , 2015, Biometrika.

[40]  Xiangyu Chang,et al.  Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels , 2012, ArXiv.

[41]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[42]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[43]  Amir Dembo,et al.  Regularity method and large deviation principles for the Erd\H{o}s--R\'enyi hypergraph , 2021 .

[44]  Yufei Zhao,et al.  An $L^p$ theory of sparse graph convergence I: Limits, sparse random graph models, and power law distributions , 2014, Transactions of the American Mathematical Society.

[45]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[46]  Ronen Eldan,et al.  Gaussian-width gradient complexity, reverse log-Sobolev inequalities and nonlinear large deviations , 2016, Geometric and Functional Analysis.

[47]  Elchanan Mossel,et al.  The Mean-Field Approximation: Information Inequalities, Algorithms, and Complexity , 2018, COLT.

[48]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[49]  Andrea Montanari,et al.  An Instability in Variational Inference for Topic Models , 2018, ICML.

[50]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[51]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[52]  B. Szegedy,et al.  Szemerédi’s Lemma for the Analyst , 2007 .

[53]  Anderson Y. Zhang,et al.  Theoretical and Computational Guarantees of Mean Field Variational Inference for Community Detection , 2017, The Annals of Statistics.

[54]  Kolyan Ray,et al.  Spike and slab variational Bayes for high dimensional logistic regression , 2020, NeurIPS.

[55]  Edward C. Posner,et al.  Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[56]  V. Sós,et al.  Convergent Sequences of Dense Graphs II. Multiway Cuts and Statistical Physics , 2012 .

[57]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[58]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.