Bayesian selection of best subsets via hybrid search

Over the past decades, variable selection for high-dimensional data has drawn increasing attention. With a large number of predictors, there rises a big challenge for model fitting and prediction. In this paper, we develop a new Bayesian method of best subset selection using a hybrid search algorithm that combines a deterministic local search and a stochastic global search. To reduce the computational cost of evaluating multiple candidate subsets for each update, we propose a novel strategy that enables us to calculate exact marginal likelihoods of all neighbor models simultaneously in a single computation. In addition, we establish model selection consistency for the proposed method in the high-dimensional setting in which the number of possible predictors can increase faster than the sample size. Simulation study and real data analysis are conducted to investigate the performance of the proposed method.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[3]  A. Bergmann,et al.  vps25 mosaics display non-autonomous cell survival and overgrowth, and autonomous apoptosis , 2006, Development.

[4]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[10]  Interplay between human nucleolar GNL1 and RPS20 is critical to modulate cell proliferation , 2018, Scientific Reports.

[11]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[12]  R. R. Hocking,et al.  Selection of the Best Subset in Regression Analysis , 1967 .

[13]  Joseph D. Janizek,et al.  Accurate classification of BRCA1 variants with saturation genome editing , 2018, Nature.

[14]  R. Carroll,et al.  Stochastic Approximation in Monte Carlo Computation , 2007 .

[15]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[16]  J. Bertoglio,et al.  Phosphorylation of ARHGAP19 by CDK1 and ROCK regulates its subcellular localization and function during mitosis , 2018, Journal of Cell Science.

[17]  F. Liang,et al.  Bayesian Subset Modeling for High-Dimensional Generalized Linear Models , 2013 .

[18]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.