Learning Gaussian Mixture Models With Entropy-Based Criteria

In this paper, we address the problem of estimating the parameters of Gaussian mixture models. Although the expectation-maximization (EM) algorithm yields the maximum-likelihood (ML) solution, its sensitivity to the selection of the starting parameters is well-known and it may converge to the boundary of the parameter space. Furthermore, the resulting mixture depends on the number of selected components, but the optimal number of kernels may be unknown beforehand. We introduce the use of the entropy of the probability density function (pdf) associated to each kernel to measure the quality of a given mixture model with a fixed number of kernels. We propose two methods to approximate the entropy of each kernel and a modification of the classical EM algorithm in order to find the optimum number of components of the mixture. Moreover, we use two stopping criteria: a novel global mixture entropy-based criterion called Gaussianity deficiency (GD) and a minimum description length (MDL) principle-based one. Our algorithm, called entropy-based EM (EBEM), starts with a unique kernel and performs only splitting by selecting the worst kernel attending to GD. We have successfully tested it in probability density estimation, pattern classification, and color image segmentation. Experimental results improve the ones of other state-of-the-art model order selection methods.

[1]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[2]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Roy L. Streit,et al.  Maximum likelihood training of probabilistic neural networks , 1994, IEEE Trans. Neural Networks.

[4]  Rong Zhang,et al.  Integrating bottom-up/top-down for object recognition by data driven Markov chain Monte Carlo , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[5]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[6]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[7]  José M. N. Leitão,et al.  On Fitting Mixture Models , 1999, EMMCVPR.

[8]  Zhihua Zhang,et al.  Learning a multivariate Gaussian mixture model with the reversible jump MCMC algorithm , 2004, Stat. Comput..

[9]  Alfred O. Hero,et al.  Asymptotic theory of greedy approximations to minimal k-point random graphs , 1999, IEEE Trans. Inf. Theory.

[10]  Djamel Bouchaffra,et al.  Genetic-based EM algorithm for learning Gaussian mixture models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  W. J. Hall,et al.  Approximating Priors by Mixtures of Natural Conjugate Priors , 1983 .

[12]  Dirk Husmeier,et al.  The Bayesian Evidence Scheme for Regularizing Probability-Density Estimating Neural Networks , 2000, Neural Computation.

[13]  Aristidis Likas,et al.  Unsupervised Learning of Gaussian Mixtures Based on Variational Component Splitting , 2007, IEEE Transactions on Neural Networks.

[14]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[15]  Anil K. Jain,et al.  A test of randomness based on the minimal spanning tree , 1983, Pattern Recognit. Lett..

[16]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[17]  Ben J. A. Kröse,et al.  Efficient Greedy Learning of Gaussian Mixture Models , 2003, Neural Computation.

[18]  Adrian G. Bors,et al.  Blind Source Separation USing Variational Expectation-Maximization Algorithm , 2003, CAIP.

[19]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Heinrich Niemann,et al.  A Novel Probabilistic Model for Object Recognition and Pose Estimation , 2001, Int. J. Pattern Recognit. Artif. Intell..

[21]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[22]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[23]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[24]  Karol Zyczkowski,et al.  Rényi Extrapolation of Shannon Entropy , 2003, Open Syst. Inf. Dyn..

[25]  H. Katzgraber Introduction to Monte Carlo Methods , 2009, 0905.1629.

[26]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[27]  Michel Verleysen,et al.  Flexible and Robust Bayesian Classification by Finite Mixture Models , 2004, ESANN.

[28]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[29]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[30]  Lei Xu,et al.  BYY harmony learning, structural RPCL, and topological self-organizing on mixture models , 2002, Neural Networks.

[31]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[32]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[33]  Lei Xu,et al.  Bayesian Ying-Yang machine, clustering and number of clusters , 1997, Pattern Recognit. Lett..

[34]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..

[35]  George E. P. Box,et al.  Bayesian Inference in Statistical Analysis: Box/Bayesian , 1992 .

[36]  Zhihua Zhang,et al.  EM algorithms for Gaussian mixtures with split-and-merge operation , 2003, Pattern Recognition.

[37]  D. Bertsimas,et al.  An asymptotic determination of the minimum spanning tree and minimum matching constants in geometrical probability , 1990 .

[38]  Juan Manuel Sáez,et al.  EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[39]  Anil K. Jain,et al.  Unsupervised selection and estimation of finite mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[40]  Jordi Vitrià,et al.  Learning mixture models using a genetic version of the EM algorithm , 2000, Pattern Recognition Letters.

[41]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[42]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[43]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[44]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[46]  Abdelkader Mokkadem,et al.  Estimation of the entropy and information of absolutely continuous random variables , 1989, IEEE Trans. Inf. Theory.

[47]  Petros Dellaportas,et al.  Multivariate mixtures of normals with unknown number of components , 2006, Stat. Comput..

[48]  Stan Z. Li,et al.  Markov Random Field Modeling in Computer Vision , 1995, Computer Science Workbench.

[49]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[50]  Paul A. Viola,et al.  Empirical Entropy Manipulation for Real-World Problems , 1995, NIPS.

[51]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[52]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[53]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[54]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[55]  P. Deb Finite Mixture Models , 2008 .

[56]  N. Nasios,et al.  Variational learning for Gaussian mixture models , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[57]  Nikos A. Vlassis,et al.  A kurtosis-based dynamic approach to Gaussian mixture modeling , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[58]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .