Fast Laplace Approximation for Sparse Bayesian Spike and Slab Models

We consider the application of Bayesian spike-and-slab models in high-dimensional feature selection problems. To do so, we propose a simple yet effective fast approximate Bayesian inference algorithm based on Laplace's method. We exploit two efficient optimization methods, GIST [Gong et al., 2013] and L-BFGS [Nocedal, 1980], to obtain the mode of the posterior distribution. Then we propose an ensemble Nystrom based approach to calculate the diagonal of the inverse Hessian over the mode to obtain the approximate posterior marginals in O(knp) time, k ≪ p. Furthermore, we provide the theoretical analysis about the estimation consistency and approximation error bounds. With the posterior marginals of the model weights, we use quadrature integration to estimate the marginal posteriors of selection probabilities and indicator variables for all features, which quantify the selection uncertainty. Our method not only maintains the benefits of the Bayesian treatment (e.g., uncertainty quantification) but also possesses the computational efficiency, and oracle properties of the frequentist methods. Simulation shows that our method estimates better or comparable selection probabilities and indicator variables than alternative approximate inference methods such as VB and EP, but with less running time. Extensive experiments on large real datasets demonstrate that our method often improves prediction accuracy over Bayesian automatic relevance determination, EP, and frequentist L1 type methods.

[1]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[2]  Thibault Helleputte,et al.  Expectation Propagation for Bayesian Multi-task Feature Selection , 2010, ECML/PKDD.

[3]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[4]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[5]  Daniel Hernández-Lobato,et al.  Expectation Propagation for microarray data classification , 2010, Pattern Recognit. Lett..

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[8]  Meland,et al.  THE USE OF MOLECULAR PROFILING TO PREDICT SURVIVAL AFTER CHEMOTHERAPY FOR DIFFUSE LARGE-B-CELL LYMPHOMA , 2002 .

[9]  Nicholas Arcolano,et al.  Approximation of Positive Semidefinite Matrices Using the Nystrom Method , 2011 .

[10]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[11]  James Glass,et al.  MIT Computer Science and Artificial Intelligence Laboratory , 2015 .

[12]  T. Yen A majorization–minimization approach to variable selection using spike and slab priors , 2010, 1005.0891.

[13]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[14]  K. Roberts,et al.  Thesis , 2002 .

[15]  José Miguel Hernández-Lobato Balancing flexibility and robustness in machine learning: semi-parametric methods and sparse linear models , 2010 .

[16]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Jieping Ye,et al.  A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems , 2013, ICML.

[19]  Noah A. Smith,et al.  Predicting Risk from Financial Reports with Regression , 2009, NAACL.

[20]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[21]  C. Caldwell Mathematics of Computation , 1999 .

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Tommi S. Jaakkola,et al.  Approximate Expectation Propagation for Bayesian Inference on Large-scale Problems , 2005 .

[24]  Yuan Qi,et al.  Joint network and node selection for pathway-based genomic data analysis , 2013, Bioinform..

[25]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[26]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[27]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[28]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[29]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[30]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[31]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.