Skew t mixture of experts

Abstract Mixture of experts (MoE) is a popular framework in the fields of statistics and machine learning for modeling heterogeneity in data for regression, classification and clustering. MoE for continuous data are usually based on the normal distribution. However, it is known that for data with asymmetric behavior, heavy tails and atypical observations, the use of the normal distribution is unsuitable. We introduce a new robust non-normal mixture of experts modeling using the skew t distribution. The proposed skew t mixture of experts, named STMoE, handles these issues of the normal mixtures experts regarding possibly skewed, heavy-tailed and noisy data. We develop a dedicated expectation conditional maximization (ECM) algorithm to estimate the model parameters by monotonically maximizing the observed data log-likelihood. We describe how the presented model can be used in prediction and in model-based clustering of regression data. Numerical experiments carried out on simulated data show the effectiveness and the robustness of the proposed model in fitting non-linear regression functions as well as in model-based clustering. Then, the proposed model is applied to the real-world data of tone perception for musical data analysis, and the one of temperature anomalies for the analysis of climate change data. The obtained results confirm the usefulness of the model for practical data analysis applications.

[1]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Faicel Chamroukhi,et al.  Robust mixture of experts modeling using the t distribution , 2016, Neural Networks.

[3]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[4]  Kert Viele,et al.  Modeling with Mixtures of Linear Regressions , 2002, Stat. Comput..

[5]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[6]  Ke Chen,et al.  Improved learning algorithms for mixture of experts in multiclass classification , 1999, Neural Networks.

[7]  Weixin Yao,et al.  Robust fitting of mixture regression models , 2012, Comput. Stat. Data Anal..

[8]  A. Azzalini,et al.  Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution , 2003, 0911.2342.

[9]  E. Cohen,et al.  Some Effects of Inharmonic Partials on Interval Perception , 1984 .

[10]  V. H. Lachos,et al.  Robust mixture regression modeling based on scale mixtures of skew-normal distributions , 2015, TEST.

[11]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[12]  Tsung I. Lin,et al.  Robust mixture modeling using multivariate skew t distributions , 2010, Stat. Comput..

[13]  S. Frühwirth-Schnatter,et al.  Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. , 2010, Biostatistics.

[14]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[15]  Roderick Murray-Smith,et al.  Hierarchical Gaussian process mixtures for regression , 2005, Stat. Comput..

[16]  Makiko Sato,et al.  GISS analysis of surface temperature change , 1999 .

[17]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[18]  I. C. Gormley,et al.  A mixture of experts latent position cluster model for social network data , 2010 .

[19]  Geoffrey J. McLachlan,et al.  Model-based clustering and classification with non-normal mixture distributions , 2013, Stat. Methods Appl..

[20]  Allou Samé,et al.  A regression model with a hidden logistic process for feature extraction from time series , 2009, 2009 International Joint Conference on Neural Networks.

[21]  D. Hunter,et al.  Semiparametric mixtures of regressions , 2012 .

[22]  Sharon X. Lee,et al.  Finite mixtures of canonical fundamental skew $$t$$t-distributions , 2014 .

[23]  Giorgio Vittadini,et al.  Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions , 2012, J. Classif..

[24]  H. Akaike A new look at the statistical model identification , 1974 .

[25]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[26]  Gilda Soromenho,et al.  Fitting mixtures of linear regressions , 2010 .

[27]  Wenxin Jiang,et al.  On the identifiability of mixtures-of-experts , 1999, Neural Networks.

[28]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[29]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Sveriges Riksbank Flexible modeling of conditional distributions using smooth mixtures of asymmetric student T densities , 2009 .

[31]  Faicel Chamroukhi Robust non-normal mixtures of experts , 2015 .

[32]  Steve R. Waterhouse,et al.  Bayesian Methods for Mixtures of Experts , 1995, NIPS.

[33]  Faicel Chamroukhi,et al.  Non-Normal Mixtures of Experts , 2015, ArXiv.

[34]  Geoffrey J. McLachlan,et al.  Finite mixtures of multivariate skew t-distributions: some recent and new results , 2014, Stat. Comput..

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  I. C. Gormley,et al.  Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap , 2015, Comput. Stat..

[37]  Geoffrey J. McLachlan,et al.  Laplace mixture of linear experts , 2016, Comput. Stat. Data Anal..

[38]  D. S. Young,et al.  Mixtures of regressions with predictor-dependent mixing proportions , 2010, Comput. Stat. Data Anal..

[39]  Yan Wei,et al.  ROBUST MIXTURE REGRESSION MODELS USING T-DISTRIBUTION , 2012 .

[40]  N. Henze A Probabilistic Representation of the 'Skew-normal' Distribution , 1986 .

[41]  Geoffrey J. McLachlan,et al.  On mixtures of skew normal and skew $$t$$-distributions , 2012, Adv. Data Anal. Classif..

[42]  Makiko Sato,et al.  A closer look at United States and global surface temperature change , 2001 .

[43]  T. Choi,et al.  Gaussian Process Regression Analysis for Functional Data , 2011 .

[44]  Weixin Yao,et al.  Robust mixture regression model fitting by Laplace distribution , 2014, Comput. Stat. Data Anal..

[45]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[46]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[47]  Paul D. McNicholas,et al.  Clustering with the multivariate normal inverse Gaussian distribution , 2016, Comput. Stat. Data Anal..

[48]  Jack C. Lee,et al.  Robust mixture modeling using the skew t distribution , 2007, Stat. Comput..

[49]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[50]  Makiko Sato,et al.  NASA GISS Surface Temperature (GISTEMP) Analysis , 2016 .

[51]  I. C. Gormley,et al.  Mixture of Experts Modelling with Social Science Applications , 2011 .

[52]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[53]  R. D. Veaux,et al.  Mixtures of linear regressions , 1989 .

[54]  I. C. Gormley,et al.  A mixture of experts model for rank data with applications in election studies , 2008, 0901.4203.

[55]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[56]  J. B. Ramsey,et al.  Estimating Mixtures of Normal Distributions and Switching Regressions , 1978 .

[57]  Geoffrey J. McLachlan,et al.  FITTING FINITE MIXTURE MODELS IN A REGRESSION CONTEXT , 1992 .

[58]  Geoffrey J. McLachlan,et al.  Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification , 2004, IEEE Transactions on Neural Networks.

[59]  Jill P. Mesirov,et al.  Automated High-Dimensional Flow Cytometric Data Analysis , 2010, RECOMB.

[60]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[61]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[62]  Allou Samé,et al.  Time series modeling by a regression approach based on a latent process , 2009, Neural Networks.

[63]  R. Quandt A New Approach to Estimating Switching Regressions , 1972 .

[64]  Tsung-I Lin,et al.  Finite mixture modelling using the skew normal distribution , 2007 .