Model-Based Clustering , Discriminant Analysis , and Density Estimation 1

Cluster analysis is the automated search for groups of related observations in a data set. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as "How many clusters are there?", "Which clustering method should be used?" and "How should outliers be handled?". We outline a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology, and discuss recent developments in model-based clustering for non-Gaussian data, high-dimensional datasets, large datasets, and Bayesian estimation.

[1]  John H. Wolfe,et al.  A COMPUTER PROGRAM FOR THE MAXIMUM LIKELIHOOD ANALYSIS OF TYPES , 1965 .

[2]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[5]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[6]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[7]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  D. Binder Bayesian cluster analysis , 1978 .

[10]  F. Krauss Latent Structure Analysis , 1980 .

[11]  Wei-Chien Chang On using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions , 1983 .

[12]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[13]  Adrian E. Raftery,et al.  Fitting straight lines to point patterns , 1984, Pattern Recognit..

[14]  L. A. Goodman,et al.  Latent Structure Analysis of a Set of Multidimensional Contingency Tables , 1984 .

[15]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[16]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[17]  J. Friedman Regularized Discriminant Analysis , 1989 .

[18]  Kenneth L. Clarkson,et al.  Fast linear expected-time algorithms for computing maxima and convex hulls , 1993, SODA '90.

[19]  W. DeSarbo,et al.  Multiclus: A new method for simultaneously performing multidimensional scaling and cluster analysis , 1991 .

[20]  B. Leroux Consistent estimation of a mixing distribution , 1992 .

[21]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[22]  P. Guttorp,et al.  Nonparametric Estimation of Nonstationary Spatial Covariance Structure , 1992 .

[23]  M. West,et al.  A Bayesian method for classification and discrimination , 1992 .

[24]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[25]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[26]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[27]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[28]  G. Celeux,et al.  Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition , 1996 .

[29]  G. Celeux,et al.  Stochastic versions of the em algorithm: an experimental study in the mixture case , 1996 .

[30]  P. Müller,et al.  Bayesian curve fitting using multivariate normal mixtures , 1996 .

[31]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[32]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[33]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[34]  J. Simonoff Multivariate Density Estimation , 1996 .

[35]  H. Bock Probabilistic models in cluster analysis , 1996 .

[36]  Adrian E. Raftery,et al.  Inference in model-based cluster analysis , 1997, Stat. Comput..

[37]  Adrian E. Raftery,et al.  Principal Curve Clustering With Noise , 1997 .

[38]  L. Wasserman,et al.  Practical Bayesian Density Estimation Using Mixtures of Normals , 1997 .

[39]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[40]  Adrian E. Raftery,et al.  Linear flaw detection in woven textiles using model-based clustering , 1997, Pattern Recognit. Lett..

[41]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[42]  Gilles Celeux,et al.  Bayesian Inference for Mixture: The Label Switching Problem , 1998, COMPSTAT.

[43]  Volker Tresp,et al.  Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates , 1998, IEEE Trans. Neural Networks.

[44]  Hans-Hermann Bock,et al.  Probabilistic Aspects in Classification , 1998 .

[45]  I. Yang,et al.  7. Latent Class Marginal Models for Cross-Classifications of Counts , 1998 .

[46]  A. Raftery,et al.  Three Types of Gamma-Ray Bursts , 1998, astro-ph/9802085.

[47]  Chris Fraley,et al.  Algorithms for Model-Based Gaussian Hierarchical Clustering , 1998, SIAM J. Sci. Comput..

[48]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .

[49]  U. Fayyad,et al.  Scaling EM (Expectation Maximization) Clustering to Large Databases , 1998 .

[50]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[51]  Andrew W. Moore,et al.  Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[52]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Cluster Analysis , 1999 .

[53]  Gérard Govaert,et al.  An improvement of the NEC criterion for assessing the number of clusters in a mixture model , 1999, Pattern Recognit. Lett..

[54]  G. Govaert,et al.  Choosing models in model-based clustering and discriminant analysis , 1999 .

[55]  D. Weakliem A Critique of the Bayesian Information Criterion for Model Selection , 1999 .

[56]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[57]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[58]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[60]  Padhraic Smyth,et al.  Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.

[61]  C. Posse Hierarchical Model-Based Clustering for Large Datasets , 2001 .

[62]  A. Raftery,et al.  Bayesian Multidimensional Scaling and Choice of Dimension , 2001 .

[63]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.