A global algorithm to estimate the expectations of the components of an observed univariate mixture

This paper deals with the unsupervised classification of univariate observations. Given a set of observations originating from a K-component mixture, we focus on the estimation of the component expectations. We propose an algorithm based on the minimization of the “K-product” (KP) criterion we introduced in a previous work. We show that the global minimum of this criterion can be reached by first solving a linear system then calculating the roots of some polynomial of order K. The KP global minimum provides a first raw estimate of the component expectations, then a nearest-neighbour classification enables to refine this estimation. Our method’s relevance is finally illustrated through simulations of various mixtures. When the mixture components do not strongly overlap, the KP algorithm provides better estimates than the Expectation-Maximization algorithm.

[1]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[4]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[5]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[6]  Gilles Celeux,et al.  On Stochastic Versions of the EM Algorithm , 1995 .

[7]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[8]  F. Uhlig,et al.  General polynomial roots and their multiplicities in O(N)memory and O(N 2)Time , 1999 .

[9]  B. Lindsay,et al.  Measuring the relative effectiveness of moment estimators as starting values in maximizing likelihoods , 1994 .

[10]  L. Fety,et al.  The "K-Product" Criterion for Gaussian Mixture Estimation , 2006, Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006.

[11]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Adam W. Bojanczyk,et al.  Stability analysis of a general toeplitz systems solver , 1995, Numerical Algorithms.

[13]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[14]  Lloyd Allison,et al.  Minimum Message Length Grouping of Ordered Data , 2000, ALT.

[15]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Djamel Bouchaffra,et al.  Genetic-based EM algorithm for learning Gaussian mixture models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.