On the sparse Bayesian learning of linear models

ABSTRACT This work is a re-examination of the sparse Bayesian learning (SBL) of linear regression models of Tipping (2001) in a high-dimensional setting with a sparse signal. We show that in general the SBL estimator does not recover the sparsity structure of the signal. To remedy this, we propose a hard-thresholded version of the SBL estimator that achieves, for orthogonal design matrices, the non asymptotic estimation error rate of , where n is the sample size, p is the number of regressors, σ is the regression model standard deviation, and s is the number of non zero regression coefficients. We also establish that with high probability the estimator recovers the sparsity structure of the signal. In our simulations we found that the performance of thresholded SBL depends on the strength of the signal. With a weak signal thresholded SBL performs poorly compared to least absolute shrinkage and selection operator (lasso) (Tibshirani, 1996), but outperforms lasso when the signal is strong.

[1]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[2]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.

[3]  Michael E. Tipping,et al.  Analysis of Sparse Bayesian Learning , 2001, NIPS.

[4]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[5]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[6]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[7]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[8]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[9]  Sylvia Richardson,et al.  Evolutionary Stochastic Search for Bayesian model exploration , 2010, 1002.2706.

[10]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[13]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[14]  David Madigan,et al.  Priors on the Variance in Sparse Bayesian Learning; the demi-Bayesian Lasso , 2008 .

[15]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.

[16]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[17]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[18]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.