Examining the effect of initialization strategies on the performance of Gaussian mixture modeling

Mixture modeling is a popular technique for identifying unobserved subpopulations (e.g., components) within a data set, with Gaussian (normal) mixture modeling being the form most widely used. Generally, the parameters of these Gaussian mixtures cannot be estimated in closed form, so estimates are typically obtained via an iterative process. The most common estimation procedure is maximum likelihood via the expectation-maximization (EM) algorithm. Like many approaches for identifying subpopulations, finite mixture modeling can suffer from locally optimal solutions, and the final parameter estimates are dependent on the initial starting values of the EM algorithm. Initial values have been shown to significantly impact the quality of the solution, and researchers have proposed several approaches for selecting the set of starting values. Five techniques for obtaining starting values that are implemented in popular software packages are compared. Their performances are assessed in terms of the following four measures: (1) the ability to find the best observed solution, (2) settling on a solution that classifies observations correctly, (3) the number of local solutions found by each technique, and (4) the speed at which the start values are obtained. On the basis of these results, a set of recommendations is provided to the user.

[1]  D. Steinley Profiling local optima in K-means clustering: developing a diagnostic technique. , 2006, Psychological methods.

[2]  Dimitris Karlis,et al.  Choosing Initial Values for the EM Algorithm for Finite Mixtures , 2003, Comput. Stat. Data Anal..

[3]  Michael C. Fu,et al.  A Model Reference Adaptive Search Method for Global Optimization , 2007, Oper. Res..

[4]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[5]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[6]  M. Brusco,et al.  Evaluating mixture modeling for clustering: recommendations and cautions. , 2011, Psychological methods.

[7]  Daniel J Bauer,et al.  Local solutions in the estimation of growth mixture models. , 2006, Psychological methods.

[8]  Brian Everitt,et al.  Cluster analysis , 1974 .

[9]  M. Brusco,et al.  Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures , 2008 .

[10]  Karl Mosler,et al.  A Cautionary Note on Likelihood Ratio Tests in Mixture Models , 2000 .

[11]  Wolfgang Jank,et al.  New global optimization algorithms for model-based clustering , 2009, Comput. Stat. Data Anal..

[12]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[13]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[14]  K. Bucholz,et al.  Alcohol use among older adults in the National Epidemiologic Survey on Alcohol and Related Conditions: a latent class analysis. , 2009, Journal of studies on alcohol and drugs.

[15]  Volodymyr Melnykov,et al.  Initializing the EM algorithm in Gaussian mixture models with an unknown number of components , 2012, Comput. Stat. Data Anal..

[16]  T. Chung,et al.  Classification and course of alcohol problems among adolescents in addictions treatment programs. , 2001, Alcoholism, clinical and experimental research.

[17]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[18]  Adrian E. Raftery,et al.  MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .

[19]  B. Muthén,et al.  Performance of Factor Mixture Models as a Function of Model Size, Covariate Effects, and Class-Specific Parameters. , 2007 .

[20]  G. W. Milligan,et al.  An algorithm for generating artificial test clusters , 1985 .

[21]  Michael J. Brusco,et al.  Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques , 2007, J. Classif..

[22]  Douglas Steinley,et al.  Local optima in K-means clustering: what you don't know may hurt you. , 2003, Psychological methods.

[23]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[24]  M. Brusco,et al.  Choosing the number of clusters in Κ-means clustering. , 2011, Psychological methods.

[25]  R. Leeman,et al.  A latent class analysis of DSM-IV alcohol use disorder criteria and binge drinking in undergraduates. , 2012, Alcoholism, clinical and experimental research.

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  Harry Joe,et al.  Separation index and partial membership for clustering , 2006, Comput. Stat. Data Anal..

[28]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[29]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[30]  Douglas Steinley,et al.  Local Optima in Mixture Modeling , 2016, Multivariate behavioral research.

[31]  Chris Fraley,et al.  Algorithms for Model-Based Gaussian Hierarchical Clustering , 1998, SIAM J. Sci. Comput..

[32]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[33]  B. Everitt,et al.  Cluster Analysis: Everitt/Cluster Analysis , 2011 .

[34]  B. Grant,et al.  Toward an alcohol use disorder continuum using item response theory: results from the National Epidemiologic Survey on Alcohol and Related Conditions , 2006, Psychological Medicine.

[35]  C. Edelbrock Mixture Model Tests Of Hierarchical Clustering Algorithms: The Problem Of Classifying Everybody. , 1979, Multivariate behavioral research.

[36]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[37]  M. Carter Diagnostic and Statistical Manual of Mental Disorders, 5th ed. , 2014 .

[38]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[39]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[40]  Janet B W Williams,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[41]  Harry Joe,et al.  Generation of Random Clusters with Specified Degree of Separation , 2006, J. Classif..

[42]  B. Muthén,et al.  Investigating population heterogeneity with factor mixture models. , 2005, Psychological methods.

[43]  G. W. Milligan,et al.  The validation of four ultrametric clustering algorithms , 1980, Pattern Recognit..

[44]  Lawrence Hubert,et al.  Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal , 2008 .

[45]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .