Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings.

Bayesian model selection procedures based on nonlocal alternative prior densities are extended to ultrahigh dimensional settings and compared to other variable selection procedures using precision-recall curves. Variable selection procedures included in these comparisons include methods based on g-priors, reciprocal lasso, adaptive lasso, scad, and minimax concave penalty criteria. The use of precision-recall curves eliminates the sensitivity of our conclusions to the choice of tuning parameters. We find that Bayesian variable selection procedures based on nonlocal priors are competitive to all other procedures in a range of simulation scenarios, and we subsequently explain this favorable performance through a theoretical examination of their consistency properties. When certain regularity conditions apply, we demonstrate that the nonlocal procedures are consistent for linear models even when the number of covariates p increases sub-exponentially with the sample size n. A model selection procedure based on Zellner's g-prior is also found to be competitive with penalized likelihood methods in identifying the true model, but the posterior distribution on the model space induced by this method is much more dispersed than the posterior distribution induced on the model space by the nonlocal prior methods. We investigate the asymptotic form of the marginal likelihood based on the nonlocal priors and show that it attains a unique term that cannot be derived from the other Bayesian model selection procedures. We also propose a scalable and efficient algorithm called Simplified Shotgun Stochastic Search with Screening (S5) to explore the enormous model space, and we show that S5 dramatically reduces the computing time without losing the capacity to search the interesting region in the model space, at least in the simulation settings considered. The S5 algorithm is available in an R package BayesS5 on CRAN.

[1]  M. Sillanpää,et al.  Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. , 1998, Genetics.

[2]  Valen E. Johnson,et al.  High-Dimensional Bayesian Classifiers Using Non-Local Priors , 2013, Statistical Models for Data Analysis.

[3]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[4]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[5]  Gareth M. James,et al.  Improved variable selection with Forward-Lasso adaptive shrinkage , 2011, 1104.3390.

[6]  Dean P. Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[7]  F. Liang,et al.  Bayesian Subset Modeling for High-Dimensional Generalized Linear Models , 2013 .

[8]  Wenxin Jiang Bayesian variable selection for high dimensional generalized linear models : Convergence rates of the fitted densities , 2007, 0710.3458.

[9]  M. Yuan,et al.  Efficient Empirical Bayes Variable Selection and Estimation in Linear Models , 2005 .

[10]  Wenyi Wang,et al.  Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors , 2016, Bioinform..

[11]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[12]  F. Liang,et al.  High-Dimensional Variable Selection With Reciprocal L1-Regularization , 2015 .

[13]  V. Johnson,et al.  On the use of non‐local prior densities in Bayesian hypothesis tests , 2010 .

[14]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[15]  Hosik Choi,et al.  Consistent Model Selection Criteria on High Dimensions , 2012, J. Mach. Learn. Res..

[16]  Kean Ming Tan,et al.  Laplace Approximation in High-Dimensional Bayesian Regression , 2015, 1503.08337.

[17]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[18]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[19]  Jian Huang,et al.  Asymptotic oracle properties of SCAD-penalized least squares estimators , 2007, 0709.0863.

[20]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[21]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Donatello Telesca,et al.  Nonlocal Priors for High-Dimensional Estimation , 2014, Journal of the American Statistical Association.

[24]  Christina Kendziorski,et al.  Combined Expression Trait Correlations and Expression Quantitative Trait Locus Mapping , 2006, PLoS genetics.

[25]  Veronika Rockova,et al.  EMVS: The EM Approach to Bayesian Variable Selection , 2014 .

[26]  Qiang Liu,et al.  Variational algorithms for marginal MAP , 2011, J. Mach. Learn. Res..

[27]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[28]  Martin J. Wainwright,et al.  On the Computational Complexity of High-Dimensional Bayesian Variable Selection , 2015, ArXiv.

[29]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[30]  Sylvia Richardson,et al.  Evolutionary Stochastic Search for Bayesian model exploration , 2010, 1002.2706.

[31]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[32]  D. Madigan,et al.  Bayesian Model Averaging for Linear Regression Models , 1997 .

[33]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[34]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[35]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[36]  Brian J Reich,et al.  Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions , 2012, Journal of the American Statistical Association.

[37]  B. Mallick VARIABLE SELECTION FOR REGRESSION MODELS , 2016 .

[38]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[39]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[40]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[41]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[42]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[43]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[44]  N. Narisetty,et al.  Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[45]  V. Johnson,et al.  Bayesian Model Selection in High-Dimensional Settings , 2012, Journal of the American Statistical Association.

[46]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[47]  Min Zhang,et al.  Penalized orthogonal-components regression for large p small n data , 2008, 0811.4167.