An entropy criterion for assessing the number of clusters in a mixture model

In this paper, we consider an entropy criterion to estimate the number of clusters arising from a mixture model. This criterion is derived from a relation linking the likelihood and the classification likelihood of a mixture. Its performance is investigated through Monte Carlo experiments, and it shows favorable results compared to other classical criteria.RésuméNous proposons un critère d'entropie pour évaluer le nombre de classes d'une partition en nous fondant sur un modèle de mélange de lois de probabilité. Ce critère se déduit d'une relation liant la vraisemblance et la vraisemblance classifiante d'un mélange. Des simulations de Monte Carlo illustrent ses qualités par rapport à des critères plus classiques.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[3]  J. Wolfe A Monte Carlo Study of the Sampling Distribution of the Likelihood Ratio for Mixtures of Multinormal Distributions , 1971 .

[4]  H. Akaike A new look at the statistical model identification , 1974 .

[5]  F. Marriott 389: Separating Mixtures of Normal Distributions , 1975 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Peter Bryant,et al.  Asymptotic behaviour of classification maximum likelihood estimates , 1978 .

[9]  M. Aitkin,et al.  Mixture Models, Outliers, and the EM Algorithm , 1980 .

[10]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[11]  H. Bozdogan,et al.  Multi-sample cluster analysis using Akaike's Information Criterion , 1984 .

[12]  H. Bock On some significance tests in cluster analysis , 1985 .

[13]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[14]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[15]  G. Celeux Validity Tests in Cluster Analysis Using a Probabilistic Teacher Algorithm , 1986 .

[16]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[17]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[18]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[19]  A. Koehler,et al.  A Comparison of the Akaike and Schwarz Criteria for Selecting Model Order , 1988 .

[20]  S. Ganesalingam Classification and Mixture Approaches to Clustering Via Maximum Likelihood , 1989 .

[21]  Hans-Hermann Bock,et al.  Probabilistic Aspects in Cluster Analysis , 1989 .

[22]  H. Bozdogan On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models , 1990 .

[23]  Peter G. Bryant,et al.  Large-sample results for optimization-based clustering methods , 1991 .

[24]  G. Celeux,et al.  Clustering criteria for discrete data and latent class models , 1991 .

[25]  Adele Cutler,et al.  Information Ratios for Validating Mixture Analysis , 1992 .

[26]  Peter G. Bryant Large-sample results for optimization-based clustering , 1992 .

[27]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[28]  G. Celeux,et al.  Comparison of the mixture and the classification maximum likelihood in cluster analysis , 1993 .

[29]  H. Bozdogan Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix , 1993 .

[30]  M. P. Windham,et al.  Information-Based Validity Functionals for Mixture Analysis , 1994 .

[31]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.