On a Resampling Approach to Choosing the Number of Componentsin Normal Mixture

We consider the tting of a g-component normal mixture to multivariate data. The problem is to test whether g is equal to some speciied value versus some speciied alternative value. This problem would arise, for example, in the context of a cluster analysis eeected by a normal mixture model, where the decision on the number of clusters is undertaken by testing for the smallest value of g compatible with the data. A test statistic can be formed in terms of the likelihood ratio. Unfortunately, regularity conditions do not hold for the likelihood ratio statistic to have its usual asymptotic null distribution of chi-squared. One approach to the assessment of P-values with the use of this statistic is to adopt a resampling approach. An investigation is undertaken of the accuracy of P-values assessed in this manner.

[1]  J. Wolfe A Monte Carlo Study of the Sampling Distribution of the Likelihood Ratio for Mixtures of Multinormal Distributions , 1971 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Murray Aitkin,et al.  Statistical Modelling of Data on Teaching Styles , 1981 .

[4]  P. Sen,et al.  On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results , 1984 .

[5]  J. Hartigan A failure of likelihood asymptotics for normal mixtures , 1985 .

[6]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[7]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[8]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[9]  N. Mendell,et al.  Simulated percentage points for the null distribution of the likelihood ratio test for a mixture of two normals. , 1988, Biometrics.

[10]  P. Hall,et al.  The Effect of Simulation Order on Level Accuracy and Power of Monte Carlo Tests , 1989 .

[11]  J. Lewins Contribution to the Discussion , 1989 .

[12]  H C Thode,et al.  The likelihood ratio test for the two-component normal mixture problem: power and sample size analysis. , 1991, Biometrics.

[13]  Adele Cutler,et al.  Information Ratios for Validating Mixture Analysis , 1992 .

[14]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[15]  Patrice Loisel,et al.  Testing in normal mixture models when the proportions are known , 1992 .

[16]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[17]  H. Bozdogan Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix , 1993 .

[18]  Brigitte Mangin,et al.  Testing in Normal Mixture Models with Some Information on the Parameters , 1993 .

[19]  N. Mendell,et al.  Where is the likelihood ratio test powerful for detecting two component normal mixtures? , 1993, Biometrics.

[20]  B. Lindsay,et al.  Testing for the number of components in a mixture of normal distributions using moment estimators , 1994 .

[21]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .