Regularized Mixture of Experts for High-Dimensional Data

.We consider the Mixture of Experts (MoE) modeling for clustering heterogeneous regression data with possibly high-dimensional features and propose a regularized maximum-likelihood estimation based on a dedicated EM algorithm which integrates coordinate ascent updates of the parameters. Unlike state-of-the art regularized MLE for MoE, the proposed modeling does not require an approximate of the regularization. The proposed algorithm allows to automatically obtaining sparse solutions without thresholding, and includes coordinate ascent updates avoiding matrix inversion, and can thus be scalable. An experimental study shows the good performance of the algorithm in terms of recovering sparse solutions, density estimation, and clustering.

[1]  Caroline Meynet,et al.  An l1-oracle inequality for the Lasso in finite mixture Gaussian regression models , 2013 .

[2]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[3]  Geoffrey J. McLachlan,et al.  A globally convergent algorithm for lasso-penalized mixture of linear regression models , 2016, Comput. Stat. Data Anal..

[4]  David I. Warton,et al.  Multi-species distribution modeling using penalized mixture of regressions , 2015, 1509.04834.

[5]  Faicel Chamroukhi Skew t mixture of experts , 2017, Neurocomputing.

[6]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[7]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[8]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[9]  Abbas Khalili New estimation and feature selection methods in mixture‐of‐experts models , 2010 .

[10]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[11]  Faicel Chamroukhi,et al.  Robust mixture of experts modeling using the t distribution , 2016, Neural Networks.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[14]  Geoffrey J. McLachlan,et al.  Laplace mixture of linear experts , 2016, Comput. Stat. Data Anal..

[15]  Faicel Chamroukhi,et al.  An Introduction to the Practical and Theoretical Aspects of Mixture-of-Experts Modeling , 2017, ArXiv.

[16]  Emilie Devijver,et al.  An ℓ1-oracle inequality for the Lasso in multivariate finite mixture of multivariate Gaussian regression models , 2015 .