Clustering mixed data

Mixture model clustering proceeds by fitting a finite mixture of multivariate distributions to data, the fitted mixture density then being used to allocate the data to one of the components. Common model formulations assume that either all the attributes are continuous or all the attributes are categorical. In this paper, we consider options for model formulation in the more practical case of mixed data: multivariate data sets that contain both continuous and categorical attributes. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 352–361 DOI: 10.1002/widm.33

[1]  S. Ganesalingam Classification and Mixture Approaches to Clustering Via Maximum Likelihood , 1989 .

[2]  AhmadAmir,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[5]  David Peel,et al.  The EMMIX Algorithm for the Fitting of Normal and t-Components , 1999 .

[6]  Neil Henry Latent structure analysis , 1969 .

[7]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[8]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[9]  Christian Hennig,et al.  Comparing latent class and dissimilarity based clustering for mixed type variables with application to social stratification , 2010 .

[10]  Adrian E. Raftery,et al.  MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering † , 2007 .

[11]  Robert J. Boik,et al.  Identifiable finite mixtures of location models for clustering mixed-mode data , 1999, Stat. Comput..

[12]  Geoffrey J. McLachlan,et al.  Wallace's Approach to Unsupervised Learning: The Snob Program , 2008, Comput. J..

[13]  B. Everitt Mixture Distributions—I , 2006 .

[14]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[15]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[16]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[17]  Brian Everitt,et al.  Cluster analysis , 1974 .

[18]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[19]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[20]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[21]  Murray A. Jorgensen,et al.  Theory & Methods: Mixture model clustering using the MULTIMIX program , 1999 .

[22]  G J McLachlan,et al.  Mixture modelling for cluster analysis , 2004, Statistical methods in medical research.

[23]  J. Vermunt,et al.  Latent class cluster analysis , 2002 .

[24]  J. Vermunt,et al.  Latent Gold 4.0 User's Guide , 2005 .

[25]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[26]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[27]  P. Deb Finite Mixture Models , 2008 .

[28]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[29]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[30]  F. Krauss Latent Structure Analysis , 1980 .

[31]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[32]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[33]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[34]  Jeroen K. Vermunt,et al.  'EM: A general program for the analysis of categorical data 1 , 1997 .

[35]  John C. Gower,et al.  Measures of Similarity, Dissimilarity and Distance , 1985 .

[36]  Jeroen K. Vermunt,et al.  LEM: A general program for the analysis of categorical data. Users manual , 1997 .

[37]  Adrian E. Raftery,et al.  MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .