K-modes Clustering

We norm (defined as the limit of an Lp norm as p approaches zero).In Monte Carlo simulations, both K-modes and the latent class procedures (e.g., Goodman 1974) performed with equal efficiency in recovering a known underlying cluster structure. However, K-modes is an order of magnitude faster than the latent class procedure in speed and suffers from fewer problems of local optima than do the latent class procedures. For data sets involving a large number of categorical variables, latent class procedures become computationally extremly slow and hence infeasible. We conjecture that, although in some cases latent class procedures might perform better than K-modes, it could out-perform latent class procedures in other cases. Hence, we recommend that these two approaches be used as "complementary" procedures in performing cluster analysis. We also present an empirical comparison of K-modes and latent class, where the former method prevails.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  J. Carroll,et al.  Interpoint Distance Comparisons in Correspondence Analysis , 1986 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[6]  Paul E. Green,et al.  A Computational Study of Replicated Clustering with an Application to Market Segmentation , 1991 .

[7]  J. Carroll,et al.  A Feature-Based Approach to Market Segmentation via Overlapping K-Centroids Clustering , 1997 .

[8]  William R. Dillon,et al.  LADI: A Latent Discriminant Model for Analyzing Marketing Research Data , 1989 .

[9]  J. Carroll,et al.  An alternating combinatorial optimization approach to fitting the INDCLUS and generalized INDCLUS models , 1994 .

[10]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[11]  P. Arabie,et al.  Cluster analysis in marketing research , 1994 .

[12]  R. P. McDonald,et al.  An index of goodness-of-fit based on noncentrality , 1989 .

[13]  B. Mirkin A sequential fitting procedure for linear data analysis models , 1990 .

[14]  Rabikar Chatterjee,et al.  Joint Segmentation on Distinct Interdependent Bases with Categorical Data , 1996 .

[15]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .