Bayesian model selection for high-dimensional data

Abstract High-dimensional data, where the number of features or covariates can even be larger than the number of independent samples, are ubiquitous and are encountered on a regular basis by statistical scientists both in academia and in industry. A majority of the classical research in statistics dealt with the settings where there is a small number of covariates. Due to the modern advancements in data storage and computational power, the high-dimensional data revolution has significantly occupied mainstream statistical research. In gene expression datasets, for instance, it is not uncommon to encounter datasets with observations on at most a few hundred independent samples (subjects) and with information on tens or hundreds of thousands of genes per each sample. An important and common question that arises quickly is—“which of the available covariates are relevant to the outcome of interest?” This concerns the problem of variable selection (and more generally model selection) in statistics and data science. This chapter will provide an overview of some of the most well-known model selection methods along with some of the more recent methods. While frequentist methods will be discussed, Bayesian approaches will be given a more elaborate treatment. The frequentist framework for model selection is primarily based on penalization, whereas the Bayesian framework relies on prior distributions for inducing shrinkage and sparsity. The chapter treats the Bayesian framework in the light of objective and empirical Bayesian viewpoints as the priors in the high-dimensional setting are typically not completely based subjective prior beliefs. An important practical aspect of high-dimensional model selection methods is computational scalability which will also be discussed.

[1]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[2]  Martin J. Wainwright,et al.  On the Computational Complexity of High-Dimensional Bayesian Variable Selection , 2015, ArXiv.

[3]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[4]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[5]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[6]  V. Rocková,et al.  Bayesian estimation of sparse signals with a continuous spike-and-slab prior , 2018 .

[7]  Lan Wang,et al.  Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data , 2013, 1304.2186.

[8]  G. Casella,et al.  Consistency of Bayesian procedures for variable selection , 2009, 0904.2978.

[9]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[12]  Stephen G. Walker,et al.  Empirical Bayes posterior concentration in sparse high-dimensional linear models , 2014, 1406.7718.

[13]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[14]  J. Ormerod,et al.  A variational Bayes approach to variable selection , 2017 .

[15]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[16]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[17]  David J Nott,et al.  Sampling Schemes for Bayesian Variable Selection in Generalized Linear Models , 2004 .

[18]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[19]  Lee H. Dicker,et al.  Ridge regression and asymptotic minimax estimation over spheres of growing dimension , 2016, 1601.03900.

[20]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[21]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[22]  Nicholas G. Polson,et al.  The Horseshoe+ Estimator of Ultra-Sparse Signals , 2015, 1502.00560.

[23]  Stephen G. Walker,et al.  Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector , 2013, 1304.7366.

[24]  B. Mallick,et al.  Fast sampling with Gaussian scale-mixture priors in high-dimensional regression. , 2015, Biometrika.

[25]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[26]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[27]  Richard G. Baraniuk,et al.  Consistent Parameter Estimation for LASSO and Approximate Message Passing , 2015, The Annals of Statistics.

[28]  Po-Ling Loh,et al.  Support recovery without incoherence: A case for nonconvex regularization , 2014, ArXiv.

[29]  Veronika Rockova,et al.  EMVS: The EM Approach to Bayesian Variable Selection , 2014 .

[30]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[31]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[32]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[33]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[34]  M. Steel,et al.  Benchmark Priors for Bayesian Model Averaging , 2001 .

[35]  Y. Wu,et al.  Bayesian Sparse Group Selection , 2016 .

[36]  Feng Liang,et al.  Bayesian Regularization for Graphical Models With Unequal Shrinkage , 2018, Journal of the American Statistical Association.

[37]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[38]  Jayanta K. Ghosh,et al.  Asymptotic Properties of Bayes Risk for the Horseshoe Prior , 2013 .

[39]  Yves F. Atchad'e On the contraction properties of some high-dimensional quasi-posterior distributions , 2015, 1508.07929.

[40]  M. Yuan,et al.  Efficient Empirical Bayes Variable Selection and Estimation in Linear Models , 2005 .

[41]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[42]  Minsuk Shin,et al.  Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings. , 2015, Statistica Sinica.

[43]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[44]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[45]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[46]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[47]  A. Tsybakov,et al.  Sparse Estimation by Exponential Weighting , 2011, 1108.5116.

[48]  R. Koenker Quantile Regression: Name Index , 2005 .

[49]  Dean P. Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[50]  F. Liang,et al.  Bayesian Subset Modeling for High-Dimensional Generalized Linear Models , 2013 .

[51]  Edward I. George,et al.  ADAPTIVE BAYESIAN CRITERIA IN VARIABLE SELECTION FOR GENERALIZED LINEAR MODELS , 2007 .

[52]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[53]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[54]  R. Carroll,et al.  Stochastic Approximation in Monte Carlo Computation , 2007 .

[55]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[56]  N. Pillai,et al.  Dirichlet–Laplace Priors for Optimal Shrinkage , 2014, Journal of the American Statistical Association.

[57]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[58]  N. Narisetty,et al.  Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[59]  Xiaofang Xu,et al.  Bayesian Variable Selection and Estimation for Group Lasso , 2015, 1512.01013.

[60]  V. Johnson,et al.  Bayesian Model Selection in High-Dimensional Settings , 2012, Journal of the American Statistical Association.

[61]  Dit-Yan Yeung,et al.  Towards Bayesian Deep Learning: A Framework and Some Existing Methods , 2016, IEEE Transactions on Knowledge and Data Engineering.

[62]  H. Kozumi,et al.  Gibbs sampling methods for Bayesian quantile regression , 2011 .

[63]  Eugene Grechanovsky,et al.  Conditional p-values for the F-statistic in a forward selection procedure , 1995 .

[64]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[65]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[66]  A. Belloni,et al.  Inference for High-Dimensional Sparse Econometric Models , 2011, 1201.0220.

[67]  Min Zhang,et al.  Penalized orthogonal-components regression for large p small n data , 2008, 0811.4167.

[68]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[69]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[70]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[71]  Nicholas G. Polson,et al.  Lasso Meets Horseshoe: A Survey , 2017, Statistical Science.

[72]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[73]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[74]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[75]  G. García-Donato,et al.  On Sampling Strategies in Bayesian Variable Selection Problems With Large Model Spaces , 2013 .

[76]  Faming Liang,et al.  Improving SAMC using smoothing methods: Theory and applications to Bayesian model selection problems , 2009, 0908.3553.

[77]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[78]  G. Casella,et al.  CONSISTENCY OF OBJECTIVE BAYES FACTORS AS THE MODEL DIMENSION GROWS , 2010, 1010.3821.

[79]  Sara van de Geer,et al.  Ecole d'été de probabilités de Saint-Flour XLV , 2016 .

[80]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[81]  A. Afifi,et al.  Comparison of Stopping Rules in Forward “Stepwise” Regression , 1977 .

[82]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[83]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[84]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[85]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[86]  Stefan Wager,et al.  High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.

[87]  Michael L. Littman,et al.  Bayesian Adaptive Sampling for Variable Selection and Model Averaging , 2011 .

[88]  G. Casella,et al.  Objective Bayes model selection in probit models , 2012, Statistics in medicine.

[89]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[90]  Joseph G Ibrahim,et al.  Bayesian Variable Selection and Computation for Generalized Linear Models with Conjugate Priors. , 2008, Bayesian analysis.

[91]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[92]  Luigi Salmaso,et al.  Adjusting Stepwise p-Values in Generalized Linear Models , 2010 .

[93]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[94]  Christina Kendziorski,et al.  Combined Expression Trait Correlations and Expression Quantitative Trait Locus Mapping , 2006, PLoS genetics.

[95]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[96]  Brian J Reich,et al.  Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions , 2012, Journal of the American Statistical Association.

[97]  Wenxin Jiang Bayesian variable selection for high dimensional generalized linear models : Convergence rates of the fitted densities , 2007, 0710.3458.

[98]  Keming Yu,et al.  Bayesian quantile regression , 2001 .

[99]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[100]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.