On the Computational Complexity of High-Dimensional Bayesian Variable Selection

We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee both variable-selection consistency and rapid mixing of a particular Metropolis-Hastings algorithm. The mixing time is linear in the number of covariates up to a logarithmic factor. Our proof controls the spectral gap of the Markov chain by constructing a canonical path ensemble that is inspired by the steps taken by greedy algorithms for variable selection.

[1]  H. Akaike A new look at the statistical model identification , 1974 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[4]  B. D. Finetti,et al.  Bayesian inference and decision techniques : essays in honor of Bruno de Finetti , 1986 .

[5]  P. Diaconis,et al.  Geometric Bounds for Eigenvalues of Markov Chains , 1991 .

[6]  Alistair Sinclair,et al.  Improved Bounds for Mixing Rates of Markov Chains and Multicommodity Flow , 1992, Combinatorics, Probability and Computing.

[7]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[8]  Alistair Sinclair,et al.  Algorithms for Random Generation and Counting: A Markov Chain Approach , 1993, Progress in Theoretical Computer Science.

[9]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[10]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Alan M. Frieze,et al.  Torpid mixing of some Monte Carlo Markov chain algorithms in statistical physics , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[13]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[14]  M. Ledoux The concentration of measure phenomenon , 2001 .

[15]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[16]  M. Steel,et al.  Benchmark Priors for Bayesian Model Averaging , 2001 .

[17]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[18]  Galin L. Jones,et al.  Sufficient burn-in for Gibbs samplers for a hierarchical random effects model , 2004, math/0406454.

[19]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[20]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[21]  Elchanan Mossel,et al.  Limitations of Markov chain Monte Carlo algorithms for Bayesian inference of phylogeny , 2005, The Annals of Applied Probability.

[22]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[23]  Y. Peres,et al.  Glauber dynamics for the mean-field Ising model: cut-off, critical power law, and metastability , 2007, 0712.0790.

[24]  A. Belloni,et al.  On the Computational Complexity of MCMC-Based Estimators in Large Samples , 2007 .

[25]  A. Belloni,et al.  On the Computational Complexity of MCMC-Based Estimators in Large Samples , 2007, 0704.2167.

[26]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[27]  Cun-Hui Zhang,et al.  Stepwise searching for feature variables in high-dimensional linear regression , 2008 .

[28]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[29]  Martin J. Wainwright,et al.  Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting , 2009, IEEE Trans. Inf. Theory.

[30]  Fabio Martinelli,et al.  Mixing time for the solid-on-solid model , 2009, STOC '09.

[31]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[32]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[33]  Tao Wang,et al.  Consistent tuning parameter selection in high dimensional sparse linear regression , 2011, J. Multivar. Anal..

[34]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[35]  M. Clayton,et al.  Consistency of Bayesian Linear Model Selection With a Growing Number of Parameters , 2011, 1102.0826.

[36]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.

[37]  Xiaotong Shen,et al.  Likelihood-Based Selection and Sharp Parameter Estimation , 2012 .

[38]  J. Rosenthal,et al.  Convergence rate of Markov chain methods for genomic motif discovery , 2013, 1303.2814.

[39]  N. Narisetty,et al.  Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[40]  M. Ghosh,et al.  Necessary and Sufficient Conditions for High-Dimensional Posterior Consistency under g-Priors , 2015, 1509.01060.

[41]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[42]  N. Pillai,et al.  Dirichlet–Laplace Priors for Optimal Shrinkage , 2014, Journal of the American Statistical Association.

[43]  Gersende Fort,et al.  A Shrinkage-Thresholding Metropolis Adjusted Langevin Algorithm for Bayesian Variable Selection , 2013, IEEE Journal of Selected Topics in Signal Processing.