论文信息 - Latent class models for clustering : a comparison with K-means

Latent class models for clustering : a comparison with K-means

Recent developments in latent class (LC) analysis and associated software to include continuous variables offer a model-based alternative to more traditional clustering approaches such as K-means. In this paper, the authors compare these two approaches using data simulated from a setting where true group membership is known. The authors choose a setting favourable to K-means by simulating data according to the assumptions made in both discriminant analysis (DISC) and K-means clustering. Since the information on true group membership is used in DISC but not in clustering approaches in general, the authors use the results obtained from DISC as a gold standard in determining an upper bound on the best possible outcome that might be expected from a clustering technique. The results indicate that LC substantially outperforms the K-means technique. A truly surprising result is that the LC performance is so good that it is virtually indistinguishable from the performance of DISC.

Jay Magidson | Jeroen K. Vermunt | J. Vermunt | J. Magidson

[1] J. Vermunt,et al. Latent class cluster analysis , 2002 .

[2] T. W. Anderson,et al. An Introduction to Multivariate Statistical Analysis , 1959 .

[3] Adrian E. Raftery,et al. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[4] Jay Magidson,et al. Latent Class Factor and Cluster Models, Bi-Plots, and Related Graphical Displays , 2001 .

[5] Peter C. Cheeseman,et al. Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[6] Anja Vogler,et al. An Introduction to Multivariate Statistical Analysis , 2004 .

[7] L. A. Goodman. Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[8] Peter Adams,et al. The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[9] William R. Dillon,et al. LADI: A Latent Discriminant Model for Analyzing Marketing Research Data , 1989 .

[10] Jay Magidson,et al. Latent class modeling as a probabilistic extension of K-means clustering , 2002 .

[11] Chris Fraley,et al. MCLUST: Software for Model-Based Cluster and Discriminant Analysis , 1998 .

[12] A. Raftery,et al. Model-based Gaussian and non-Gaussian clustering , 1993 .

[13] Brian Everitt,et al. Cluster analysis , 1974 .

[14] D. N. Geary. Mixture Models: Inference and Applications to Clustering , 1989 .