An effective strategy for initializing the EM algorithm in finite mixture models

Finite mixture models represent one of the most popular tools for modeling heterogeneous data. The traditional approach for parameter estimation is based on maximizing the likelihood function. Direct optimization is often troublesome due to the complex likelihood structure. The expectation–maximization algorithm proves to be an effective remedy that alleviates this issue. The solution obtained by this procedure is entirely driven by the choice of starting parameter values. This highlights the importance of an effective initialization strategy. Despite efforts undertaken in this area, there is no uniform winner found and practitioners tend to ignore the issue, often finding misleading or erroneous results. In this paper, we propose a simple yet effective tool for initializing the expectation–maximization algorithm in the mixture modeling setting. The idea is based on model averaging and proves to be efficient in detecting correct solutions even in those cases when competitors perform poorly. The utility of the proposed methodology is shown through comprehensive simulation study and applied to a well-known classification dataset with good results.

[1]  Ranjan Maitra Initializing Partition-Optimization Algorithms , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  M. Clyde,et al.  Bayesian model averaging: A tutorial - Comment , 1999 .

[3]  Christian Hennig,et al.  Methods for merging Gaussian mixture components , 2010, Adv. Data Anal. Classif..

[4]  Gérard Govaert,et al.  Rmixmod: The R Package of the Model-Based Unsupervised, Supervised and Semi-Supervised Classification Mixmod Library , 2015 .

[5]  Gilles Celeux,et al.  Combining Mixture Components for Clustering , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[6]  Volodymyr Melnykov,et al.  Initializing the EM algorithm in Gaussian mixture models with an unknown number of components , 2012, Comput. Stat. Data Anal..

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Wei-Chen Chen,et al.  MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms , 2012 .

[9]  Charles Bouveyron,et al.  Model-based clustering of high-dimensional data: A review , 2014, Comput. Stat. Data Anal..

[10]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[11]  Daniel Ståhl,et al.  Model‐based cluster analysis , 2012 .

[12]  N. Campbell,et al.  A multivariate study of variation in two species of rock crab of the genus Leptograpsus , 1974 .

[13]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[14]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[15]  A. Azzalini,et al.  The multivariate skew-normal distribution , 1996 .

[16]  Adrian E. Raftery,et al.  MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering † , 2007 .

[17]  Ranjan Maitra,et al.  Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms , 2010 .

[18]  P. Sneath The application of computers to taxonomy. , 1957, Journal of general microbiology.

[19]  Volodymyr Melnykov,et al.  Challenges in model‐based clustering , 2013 .

[20]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[21]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[22]  V. H. Lachos,et al.  mixsmsn: Fitting Finite Mixture of Scale Mixture of Skew-Normal Distributions , 2013 .

[23]  M. Cugmas,et al.  On comparing partitions , 2015 .

[24]  José G. Dias,et al.  An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods , 2004, Stat. Comput..

[25]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[26]  Volodymyr Melnykov,et al.  Semi-supervised model-based clustering with positive and negative constraints , 2015, Advances in Data Analysis and Classification.

[27]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[28]  Christophe Biernacki,et al.  Initializing EM using the properties of its trajectories in Gaussian mixtures , 2004, Stat. Comput..

[29]  Chris Fraley,et al.  Algorithms for Model-Based Gaussian Hierarchical Clustering , 1998, SIAM J. Sci. Comput..

[30]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[31]  Stefanie Seiler,et al.  Finding Groups In Data , 2016 .

[32]  Volodymyr Melnykov Merging Mixture Components for Clustering Through Pairwise Overlap , 2016 .

[33]  Wei-Chen Chen,et al.  EM Algorithm for Model-Based Clustering of Finite MixtureGaussian Distribution , 2015 .

[34]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[35]  Volodymyr Melnykov,et al.  Recent Developments in Model-Based Clustering with Applications , 2015 .

[36]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[37]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[38]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .