Estimation and Feature Selection in Mixtures of Generalized Linear Experts Models

Mixtures-of-Experts (MoE) are conditional mixture models that have shown their performance in modeling heterogeneity in data in many statistical learning approaches for prediction, including regression and classification, as well as for clustering. Their estimation in high-dimensional problems is still however challenging. We consider the problem of parameter estimation and feature selection in MoE models with different generalized linear experts models, and propose a regularized maximum likelihood estimation that efficiently encourages sparse solutions for heterogeneous data with high-dimensional predictors. The developed proximal-Newton EM algorithm includes proximal Newton-type procedures to update the model parameter by monotonically maximizing the objective function and allows to perform efficient estimation and feature selection. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data, compared to the main state-of-the art competitors.

[1]  F. Bunea Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization , 2008, 0808.4051.

[2]  I. C. Gormley,et al.  A mixture of experts model for rank data with applications in election studies , 2008, 0901.4203.

[3]  Emilie Devijver,et al.  An ℓ1-oracle inequality for the Lasso in multivariate finite mixture of multivariate Gaussian regression models , 2015 .

[4]  P. Deb Finite Mixture Models , 2008 .

[5]  Zhongyi Zhu,et al.  Risk Factor Selection in Rate Making: EM Adaptive LASSO for Zero‐Inflated Poisson Regression Models , 2014, Risk analysis : an official publication of the Society for Risk Analysis.

[6]  Faicel Chamroukhi Skew t mixture of experts , 2017, Neurocomputing.

[7]  Alvaro Soto,et al.  Embedded local feature selection within mixture of experts , 2014, Inf. Sci..

[8]  Faicel Chamroukhi,et al.  Robust mixture of experts modeling using the t distribution , 2016, Neural Networks.

[9]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[10]  Shuangge Ma,et al.  EM for regularized zero‐inflated regression models with applications to postoperative morbidity after cardiac surgery in children , 2014, Statistics in medicine.

[11]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[12]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[13]  Faicel Chamroukhi,et al.  Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models , 2018, ArXiv.

[14]  Wenxin Jiang,et al.  On the asymptotic normality of hierarchical mixtures-of-experts for generalized linear models , 2000, IEEE Trans. Inf. Theory.

[15]  Michael A. Saunders,et al.  Proximal Newton-Type Methods for Minimizing Composite Functions , 2012, SIAM J. Optim..

[16]  Wenxin Jiang,et al.  Hierarchical Mixtures-of-Experts for Exponential Family Regression Models with Generalized Linear Mean Functions: A Survey of Approximation and Consistency Results , 1998, UAI.

[17]  Wenxin Jiang,et al.  On the Approximation Rate of Hierarchical Mixtures-of-Experts for Generalized Linear Models , 1999, Neural Computation.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[20]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[21]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[22]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[23]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[24]  Adrian E. Raftery,et al.  Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[25]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[26]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[27]  Caroline Meynet,et al.  An l1-oracle inequality for the Lasso in finite mixture Gaussian regression models , 2013 .

[28]  M. Tanner,et al.  Hierarchical mixtures-of-experts for exponential family regression models: approximation and maximum , 1999 .

[29]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[30]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[31]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33]  Faicel Chamroukhi,et al.  Skew-normal Mixture of Experts , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[34]  Geoffrey J. McLachlan,et al.  Laplace mixture of linear experts , 2016, Comput. Stat. Data Anal..

[35]  Yunlu Jiang,et al.  Model selection for the localized mixture of experts models , 2018 .

[36]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[37]  Abbas Khalili New estimation and feature selection methods in mixture‐of‐experts models , 2010 .

[38]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[39]  P. Saama MAXIMUM LIKELIHOOD AND BAYESIAN METHODS FOR MIXTURES OF NORMAL DISTRIBUTIONS , 1997 .

[40]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[41]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[42]  Geoffrey J. McLachlan,et al.  A globally convergent algorithm for lasso-penalized mixture of linear regression models , 2016, Comput. Stat. Data Anal..

[43]  David I. Warton,et al.  Multi-species distribution modeling using penalized mixture of regressions , 2015, 1509.04834.

[44]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .