A Graphical Technique for Determining the Number of Components in a Mixture of Normals

Abstract When a population is assumed to be composed of a finite number of subpopulations, a natural model to choose is the finite mixture model. It will often be the case, however, that the number of component distributions is unknown and must be estimated. This problem can be difficult; for instance, the density of two mixed normals is not bimodal unless the means are separated by at least 2 standard deviations. Hence modality of the data per se can be an insensitive approach to component estimation. We demonstrate that a mixture of two normals divided by a normal density having the same mean and variance as the mixed density is always bimodal. This analytic result and other related results form the basis for a diagnostic and a test for the number of components in a mixture of normals. The density is estimated using a kernel density estimator. Under the null hypothesis, the proposed diagnostic can be approximated by a stationary Gaussian process. Under the alternative hypothesis, components in the mixtu...

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material , 1895 .

[2]  J. P. Harding,et al.  The Use of Probability Paper for the Graphical Analysis of Polymodal Frequency Distributions , 1949, Journal of the Marine Biological Association of the United Kingdom.

[3]  R. Cassie,et al.  Some uses of probability paper in the analysis of size frequency distributions , 1954 .

[4]  Bhattacharya Cg A simple method of resolution of a distribution into gaussian components. , 1967 .

[5]  C. Robertson,et al.  Some descriptive properties of normal mixtures , 1969 .

[6]  J. Hartigan,et al.  Percentage Points of a Test for Clusters , 1969 .

[7]  J. Behboodian On the Modes of a Mixture of Two Normal Distributions , 1970 .

[8]  P. Bickel,et al.  On Some Global Measures of the Deviations of Density Function Estimates , 1973 .

[9]  N E Morton,et al.  Skewness in commingled distributions. , 1976, Biometrics.

[10]  J. Hartigan Distribution Problems in Clustering , 1977 .

[11]  Patrick L. Brockett,et al.  Decomposition of superpositions of density functions and discrete distributions , 1977 .

[12]  Edward B. Fowlkes,et al.  Some Methods for Studying the Mixture of Two Normal (Lognormal) Distributions , 1979 .

[13]  D. Tosteson,et al.  Increased sodium-lithium countertransport in red cells of patients with essential hypertension. , 1980, The New England journal of medicine.

[14]  Moshe Shared,et al.  On Mixtures from Exponential Families , 1980 .

[15]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[16]  Prakasa Rao Nonparametric functional estimation , 1983 .

[17]  B. W. Silverman,et al.  Probability, Statistics and Analysis: Some properties of a test for multimodality based on kernel density estimates , 1983 .

[18]  Roger Mead,et al.  Estimation and hypothesis testing , 1983 .

[19]  D. Pollard Convergence of stochastic processes , 1984 .

[20]  P. Sen,et al.  On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results , 1984 .

[21]  J. Hartigan A failure of likelihood asymptotics for normal mixtures , 1985 .

[22]  J. Hartigan,et al.  The Dip Test of Unimodality , 1985 .

[23]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[24]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[25]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[26]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[27]  Geoffrey J. McLachlan,et al.  A Note on the Aitkin‐Rubin Approach to Hypothesis Testing in Mixture Models , 1987 .

[28]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[29]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[30]  A. Izenman,et al.  Philatelic Mixtures and Multimodal Densities , 1988 .

[31]  Allan R. Wilks,et al.  The new S language: a programming environment for data analysis and graphics , 1988 .

[32]  Hans-Hermann Bock,et al.  Classification and Related Methods of Data Analysis , 1988 .

[33]  N. Mendell,et al.  Probabilistic Measures of Adequacy of a Numerical Search for a Global Maximum , 1989 .

[34]  N. Schork,et al.  On the asymmetry of biological frequency distributions , 1990, Genetic epidemiology.

[35]  S. Reeders,et al.  Assessing the role of APNH, a gene encoding for a human amiloride-sensitive Na+/H+ antiporter, on the interindividual variation in red cell Na+/Li+ countertransport. , 1991, Journal of the American Society of Nephrology : JASN.

[36]  M. Degroot,et al.  Modeling lake-chemistry distributions: approximate Bayesian methods for estimating a finite-mixture model , 1992 .

[37]  J. Hartigan,et al.  The runt test for multimodality , 1992 .

[38]  B. Lindsay,et al.  Multivariate Normal Mixtures: A Fast Consistent Method of Moments , 1993 .