High Dimensional Linear Regression via the R2-D2 Shrinkage Prior

We propose a new class of priors for linear regression, the R-square induced Dirichlet Decomposition (R2-D2) prior. The prior is induced by a Beta prior on the coefficient of determination, and then the total prior variance of the regression coefficients is decomposed through a Dirichlet prior. We demonstrate both theoretically and empirically the advantages of the R2-D2 prior over a number of common shrink- age priors, including the Horseshoe, Horseshoe+, and Dirichlet-Laplace priors. The R2-D2 prior possesses the fastest concentration rate around zero and heaviest tails among these common shrinkage priors, which is established based on its marginal density, a Meijer G-function. We show that its Bayes estimator converges to the truth at a Kullback-Leibler super-efficient rate, attaining a sharper information theoretic bound than existing common shrinkage priors. We also demonstrate that the R2-D2 prior yields a consistent posterior. The R2-D2 prior permits straightforward Gibbs sampling and thus enjoys computational tractability. The proposed prior is further investigated in a mouse gene expression application.

[1]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[2]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[3]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[4]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[5]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[6]  N. Narisetty,et al.  Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[7]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[8]  N. Pillai,et al.  Dirichlet–Laplace Priors for Optimal Shrinkage , 2014, Journal of the American Statistical Association.

[9]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[10]  Brian J Reich,et al.  Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions , 2012, Journal of the American Statistical Association.

[11]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[12]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[13]  C. Carvalho,et al.  Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective , 2014, 1408.0464.

[14]  Christina Kendziorski,et al.  Combined Expression Trait Correlations and Expression Quantitative Trait Locus Mapping , 2006, PLoS genetics.

[15]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[16]  Nicholas G. Polson,et al.  The Horseshoe+ Estimator of Ultra-Sparse Signals , 2015, 1502.00560.

[17]  P. R. Nelson Continuous Univariate Distributions Volume 2 , 1996 .

[18]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[19]  Chenlei Leng,et al.  Bayesian adaptive Lasso , 2010, Annals of the Institute of Statistical Mathematics.

[20]  Chris Hans Bayesian lasso regression , 2009 .

[21]  P. Miller Applied asymptotic analysis , 2006 .

[22]  A. U.S.,et al.  Posterior consistency in linear models under shrinkage priors , 2013 .

[23]  Kerstin Vogler,et al.  Table Of Integrals Series And Products , 2016 .

[24]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[25]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[26]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[27]  Daniel W. Lozier,et al.  NIST Digital Library of Mathematical Functions , 2003, Annals of Mathematics and Artificial Intelligence.

[28]  Lawrence Carin,et al.  Negative Binomial Process Count and Mixture Modeling , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  David B. Dunson,et al.  Generalized Beta Mixtures of Gaussians , 2011, NIPS.

[30]  James G. Scott,et al.  The Bayesian bridge , 2011, 1109.2279.

[31]  Jerry L. Fields,et al.  The Asymptotic Expansion of the Meijer G-Function* , 1972 .

[32]  Johannes Schmidt-Hieber,et al.  Conditions for Posterior Contraction in the Sparse Normal Means Problem , 2015, 1510.02232.

[33]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[34]  Donatello Telesca,et al.  Nonlocal Priors for High-Dimensional Estimation , 2014, Journal of the American Statistical Association.