Mixture modelling for cluster analysis

Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.

[1]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[2]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[5]  A W EDWARDS,et al.  A METHOD FOR CLUSTER ANALYSIS. , 1965, Biometrics.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Peter G. Bryant,et al.  Large-sample results for optimization-based clustering methods , 1991 .

[8]  Geoffrey J. McLachlan,et al.  Robust Cluster Analysis via Mixtures of Multivariate t-Distributions , 1998, SSPR/SPR.

[9]  D. Titterington,et al.  Comparison of Discrimination Techniques Applied to a Complex Data Set of Head Injured Patients , 1981 .

[10]  David M. Rocke,et al.  Some computational issues in cluster analysis with no a priori metric , 1999 .

[11]  Lynette A. Hunt,et al.  Mixture model clustering with the multimix program , 1999, AISTATS.

[12]  Michael E. Tipping,et al.  Mixtures of Principal Component Analysers , 1997 .

[13]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[14]  F. Marriott The interpretation of multiple observations , 1974 .

[15]  Douglas M. Hawkins,et al.  Topics in Applied Multivariate Analysis: CLUSTER ANALYSIS , 1982 .

[16]  C. Hennig Breakdown points for maximum likelihood estimators of location–scale mixtures , 2004, math/0410073.

[17]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Peter Adams,et al.  The EMMIX software for the fitting of mixtures of normal and t-components , 1999 .

[19]  Geoffrey J. McLachlan,et al.  On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures , 2003, Stat. Comput..

[20]  David Peel,et al.  The EMMIX Algorithm for the Fitting of Normal and t-Components , 1999 .

[21]  Geoffrey J. McLachlan,et al.  Modelling high-dimensional data by mixtures of factor analyzers , 2003, Comput. Stat. Data Anal..

[22]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[23]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[24]  H. O. Hartley,et al.  Classification and Estimation in Analysis of Variance Problems , 1968 .

[25]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[26]  Peter G. Bryant Large-sample results for optimization-based clustering , 1992 .

[27]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers , 2000, International Conference on Machine Learning.

[28]  R C Durfee,et al.  A METHOD OF CLUSTER ANALYSIS. , 1970, Multivariate behavioral research.

[29]  D P Byar,et al.  The choice of treatment for cancer patients based on covariate information. , 1980, Bulletin du cancer.

[30]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[31]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[32]  G. J. McLachlan,et al.  9 The classification and mixture maximum likelihood approaches to cluster analysis , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[33]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[34]  Zhou Xing-cai,et al.  The EM Algorithm for Factor Analyzers:An Extension with Latent Variable , 2006 .

[35]  S. John,et al.  On Identifying the Population of Origin of Each Observation in a Mixture of Observations from Two Normal Populations , 1970 .

[36]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[37]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[38]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[39]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[40]  Murray Aitkin,et al.  Statistical Modelling of Data on Teaching Styles , 1981 .

[41]  Adrian E. Raftery,et al.  Inference in model-based cluster analysis , 1997, Stat. Comput..