Bayesian Sparse Group Selection

This article proposes a Bayesian approach for the sparse group selection problem in the regression model. In this problem, the variables are partitioned into different groups. It is assumed that only a small number of groups are active for explaining the response variable, and it is further assumed that within each active group only a small number of variables are active. We adopt a Bayesian hierarchical formulation, where each candidate group is associated with a binary variable indicating whether the group is active or not. Within each group, each candidate variable is also associated with a binary indicator, too. Thus, the sparse group selection problem can be solved by sampling from the posterior distribution of the two layers of indicator variables. We adopt a group-wise Gibbs sampler for posterior sampling. We demonstrate the proposed method by simulation studies as well as real examples. The simulation results show that the proposed method performs better than the sparse group Lasso in terms of selecting the active groups as well as identifying the active variables within the selected groups. Supplementary materials for this article are available online.

[1]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[2]  J. Geweke,et al.  Variable selection and model comparison in regression , 1994 .

[3]  R. Kohn,et al.  Nonparametric regression using Bayesian variable selection , 1996 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[6]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[7]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[8]  S. Godsill,et al.  Bayesian variable selection and regularization for time–frequency surface estimation , 2004 .

[9]  J. Griffin,et al.  Alternative prior distributions for variable selection with very many more variables than observations , 2005 .

[10]  Galin L. Jones,et al.  Fixed-Width Output Analysis for Markov Chain Monte Carlo , 2006, math/0601446.

[11]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[12]  Alessio Farcomeni,et al.  Bayesian constrained variable selection , 2007 .

[13]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[14]  Murali Haran,et al.  Markov chain Monte Carlo: Can we trust the third significant figure? , 2007, math/0703746.

[15]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[16]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[17]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[18]  Volker Roth,et al.  The Bayesian group-Lasso for analyzing contingency tables , 2009, ICML '09.

[19]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[20]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[21]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[22]  Jean-Philippe Vert,et al.  Group Lasso with Overlaps: the Latent Group Lasso approach , 2011, ArXiv.

[23]  Francesco C Stingo,et al.  INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES. , 2011, The annals of applied statistics.

[24]  Ying Nian Wu,et al.  Stochastic matching pursuit for Bayesian variable selection , 2011, Stat. Comput..

[25]  Snigdhansu Chatterjee,et al.  Sparse Group Lasso: Consistency and Climate Applications , 2012, SDM.

[26]  Hao Helen Zhang,et al.  Consistent Group Identification and Variable Selection in Regression With Correlated Predictors , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[27]  S. Geer,et al.  Correlated variables in regression: Clustering and sparse estimation , 2012, 1209.5908.

[28]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[29]  Fan Zhang,et al.  The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping , 2014, Technometrics.

[30]  Veronika Rockova,et al.  Incorporating grouping information in bayesian variable selection with applications in genomics , 2014 .

[31]  Ray-Bing Chen,et al.  Bayesian Variable Selections for Probit Models with Componentwise Gibbs Samplers , 2016, Commun. Stat. Simul. Comput..