Bayesian variable selection for finite mixture model of linear regressions

We propose a Bayesian variable selection method for fitting the finite mixture model of linear regressions. The model assumes that the observations come from a heterogeneous population which is a mixture of a finite number of sub-populations. Within each sub-population, the response variable can be explained by a linear regression on the predictor variables. If the number of predictor variables is large, it is assumed that only a small subset of variables are important for explaining the response variable. It is further assumed that for different sub-populations, different subsets of variables may be needed to explain the response variable. This gives rise to a complex variable selection problem. We propose to solve this problem within the Bayesian framework where we introduce two sets of latent variables. The first set of latent variables are membership indicators of the observations, indicating which sub-population each observation comes from. The second set of latent variables are inclusion/exclusion indicators for the predictor variables, indicating whether or not a variable is included in the regression model of a sub-population. Variable selection can then be accomplished by sampling from the posterior distributions of the indicators as well as the coefficients of the selected variables. We conduct simulation studies to demonstrate that the proposed method performs well in comparison with existing methods. We also analyze a real data set to further illustrate the usefulness of the proposed method.

[1]  Joseph G. Ibrahim,et al.  Variable Selection in Regression Mixture Modeling for the Discovery of Gene Regulatory Networks , 2007 .

[2]  Bin Chen,et al.  Bayesian model selection in finite mixture regression , 2012 .

[3]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[4]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[5]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Murali Haran,et al.  Markov chain Monte Carlo: Can we trust the third significant figure? , 2007, math/0703746.

[7]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[8]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[9]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[10]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[11]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[12]  Meïli C. Baragatti,et al.  A study of variable selection using g-prior distribution with ridge parameter , 2011, Comput. Stat. Data Anal..

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[15]  Gilles Celeux,et al.  Bayesian Inference for Mixture: The Label Switching Problem , 1998, COMPSTAT.

[16]  D. Dunson,et al.  Nonparametric Bayes Conditional Distribution Modeling With Variable Selection , 2009, Journal of the American Statistical Association.

[17]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[18]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[19]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[20]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[21]  Christopher Yau,et al.  Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination. , 2011, Bayesian analysis.

[22]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[23]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[24]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[25]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[26]  Kert Viele,et al.  Modeling with Mixtures of Linear Regressions , 2002, Stat. Comput..

[27]  M. Vannucci,et al.  Bayesian Variable Selection in Clustering High-Dimensional Data , 2005 .

[28]  Wei Liu,et al.  Model selection in finite mixture of regression models: a Bayesian approach with innovative weighted g priors and reversible jump Markov chain Monte Carlo implementation , 2015 .

[29]  R. Kohn,et al.  Simultaneous variable selection and component selection for regression density estimation with mixtures of heteroscedastic experts , 2012 .

[30]  Mitchell Watnik,et al.  Pay for Play: Are Baseball Salaries Based on Performance? , 1998 .

[31]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[32]  Amy H Herring,et al.  Bayesian Variable Selection for Latent Class Models , 2011, Biometrics.