K-means may perform as well as mixture model clustering but may also be much worse: comment on Steinley and Brusco (2011).

Steinley and Brusco (2011) presented the results of a huge simulation study aimed at evaluating cluster recovery of mixture model clustering (MMC) both for the situation where the number of clusters is known and is unknown. They derived rather strong conclusions on the basis of this study, especially with regard to the good performance of K-means (KM) compared with MMC. I agree with the authors' conclusion that the performance of KM may be equal to MMC in certain situations, which are primarily the situations investigated by Steinley and Brusco. However, a weakness of the paper is the failure to investigate many important real-world situations where theory suggests that MMC should outperform KM. This article elaborates on the KM-MMC comparison in terms of cluster recovery and provides some additional simulation results that show that KM may be much worse than MMC. Moreover, I show that KM is equivalent to a restricted mixture model estimated by maximizing the classification likelihood and comment on Steinley and Brusco's recommendation regarding the use of mixture models for clustering.

[1]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[2]  Jan de Leeuw,et al.  A Latent Markov Model to Correct for Measurement Error , 1986 .

[3]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[4]  L. Collins,et al.  Latent Class Models for Stage-Sequential Dynamic Latent Variables , 1992 .

[5]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[6]  W. DeSarbo,et al.  A Review of Recent Developments in Latent Class Regression Models , 1994 .

[7]  R. Bagozzi Advanced Methods of Marketing Research , 1994 .

[8]  J. Vallino,et al.  A Review of Recent Developments in , 1997 .

[9]  Daniel S. Nagin,et al.  Analyzing developmental trajectories: A semiparametric, group-based approach , 1999 .

[10]  Jeroen K. Vermunt,et al.  A nonparametric random-coefficients approach : The latest class regression model , 2001 .

[11]  Jay Magidson,et al.  Latent class models for clustering : a comparison with K-means , 2002 .

[12]  Jay Magidson,et al.  Latent class modeling as a probabilistic extension of K-means clustering , 2002 .

[13]  J. Vermunt,et al.  Latent class cluster analysis , 2002 .

[14]  Jeroen K. Vermunt,et al.  7. Multilevel Latent Class Models , 2003 .

[15]  David Kaplan,et al.  The Sage handbook of quantitative methodology for the social sciences , 2004 .

[16]  Bengt Muthén,et al.  Latent Variable Analysis: Growth Mixture Modeling and Related Techniques for Longitudinal Data , 2004 .

[17]  M. Meulders,et al.  A conceptual and psychometric framework for distinguishing categories and dimensions. , 2005, Psychological review.

[18]  M. Neale,et al.  Distinguishing Between Latent Classes and Continuous Factors: Resolution by Maximum Likelihood? , 2006, Multivariate behavioral research.

[19]  J. Vermunt,et al.  Latent class models in longitudinal research , 2007 .

[20]  Scott Menard,et al.  Handbook of longitudinal research : design, measurement, and analysis , 2008 .

[21]  M. Brusco,et al.  Evaluating mixture modeling for clustering: recommendations and cautions. , 2011, Psychological methods.