BIVAS: A Scalable Bayesian Method for Bi-Level Variable Selection With Applications

Abstract In this article, we consider a Bayesian bi-level variable selection problem in high-dimensional regressions. In many practical situations, it is natural to assign group membership to each predictor. Examples include that genetic variants can be grouped at the gene level and a covariate from different tasks naturally forms a group. Thus, it is of interest to select important groups as well as important members from those groups. The existing Markov chain Monte Carlo methods are often computationally intensive and not scalable to large datasets. To address this problem, we consider variational inference for bi-level variable selection. In contrast to the commonly used mean-field approximation, we propose a hierarchical factorization to approximate the posterior distribution, by using the structure of bi-level variable selection. Moreover, we develop a computationally efficient and fully parallelizable algorithm based on this variational approximation. We further extend the developed method to model datasets from multitask learning. The comprehensive numerical results from both simulation studies and real data analysis demonstrate the advantages of BIVAS for variable selection, parameter estimation, and computational efficiency over existing methods. The method is implemented in R package “bivas” available at https://github.com/mxcai/bivas. Supplementary materials for this article are available online.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[3]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[4]  J. Ormerod,et al.  A variational Bayes approach to variable selection , 2017 .

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[7]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Veronika Ročková,et al.  Particle EM for Variable Selection , 2018, Journal of the American Statistical Association.

[10]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[11]  C. Hoggart,et al.  Genome-wide association analysis of metabolic traits in a birth cohort from a founder population , 2008, Nature Genetics.

[12]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[13]  Y. Wu,et al.  Bayesian Sparse Group Selection , 2016 .

[14]  Shuang Xu,et al.  A novel variational Bayesian method for variable selection in logistic regression models , 2019, Comput. Stat. Data Anal..

[15]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[16]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[17]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[19]  Veronika Rockova,et al.  EMVS: The EM Approach to Bayesian Variable Selection , 2014 .

[20]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[21]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[22]  Ji Zhu,et al.  A ug 2 01 0 Group Variable Selection via a Hierarchical Lasso and Its Oracle Property Nengfeng Zhou Consumer Credit Risk Solutions Bank of America Charlotte , NC 28255 , 2010 .

[23]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[24]  Bradley Efron,et al.  Large-scale inference , 2010 .

[25]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[26]  Stephen D. Turner,et al.  qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots , 2014, bioRxiv.

[27]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[28]  Xiaofang Xu,et al.  Bayesian Variable Selection and Estimation for Group Lasso , 2015, 1512.01013.

[29]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.

[30]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[31]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[32]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[33]  Patrick Breheny,et al.  The group exponential lasso for bi‐level variable selection , 2015, Biometrics.

[34]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[36]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[37]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[38]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[39]  Volker Roth,et al.  The Bayesian group-Lasso for analyzing contingency tables , 2009, ICML '09.

[40]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[41]  M. Yuan,et al.  Efficient Empirical Bayes Variable Selection and Estimation in Linear Models , 2005 .