Unsupervised Learning of Finite Mixture Models

This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective "unsupervised" is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.

[1]  B. Martinet,et al.  R'egularisation d''in'equations variationnelles par approximations successives , 1970 .

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  C. B. Barry,et al.  New developments in the applications of Bayesian methods : proceedings of the First European Conference sponsored by the Centre européen d'education permanente (CEDEP) and the Institut européen d'administration des affaires (INSEAD), June 1976 , 1978 .

[6]  Sarunas Raudys,et al.  On Dimensionality, Sample Size, Classification Error, and Complexity of Classification Algorithm in Pattern Recognition , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[8]  W. J. Hall,et al.  Approximating Priors by Mixtures of Natural Conjugate Priors , 1983 .

[9]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[10]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[11]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[14]  W. Fischer,et al.  Sphere Packings, Lattices and Groups , 1990 .

[15]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[18]  Adele Cutler,et al.  Information Ratios for Validating Mixture Analysis , 1992 .

[19]  Radford M. Neal Bayesian Mixture Modeling , 1992 .

[20]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[21]  H. Bozdogan Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix , 1993 .

[22]  Roy L. Streit,et al.  Maximum likelihood training of probabilistic neural networks , 1994, IEEE Trans. Neural Networks.

[23]  Josef Kittler,et al.  Feature selection based on the approximation of class densities by finite mixtures of special type , 1995, Pattern Recognit..

[24]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[25]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[26]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[27]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[28]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[29]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[30]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .

[31]  C. S. Wallace,et al.  Unsupervised Learning Using MML , 1996, ICML.

[32]  Adrian E. Raftery,et al.  Inference in model-based cluster analysis , 1997, Stat. Comput..

[33]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  L. Wasserman,et al.  Practical Bayesian Density Estimation Using Mixtures of Normals , 1997 .

[35]  P. Tavan,et al.  Deterministic annealing for density estimation by multivariate normal mixtures , 1997 .

[36]  Anand Rangarajan Self Annealing: Unifying Deterministic Annealing and Relaxation Labelling , 1997, EMMCVPR.

[37]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[38]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[39]  Adrian E. Raftery,et al.  Linear flaw detection in woven textiles using model-based clustering , 1997, Pattern Recognit. Lett..

[40]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[41]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[42]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[43]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[44]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[45]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[46]  Alfred O. Hero,et al.  Kullback proximal algorithims for maximum-likelihood estimation , 2000, IEEE Trans. Inf. Theory.

[47]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Gérard Govaert,et al.  An improvement of the NEC criterion for assessing the number of clusters in a mixture model , 1999, Pattern Recognit. Lett..

[49]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[50]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[51]  José M. N. Leitão,et al.  On Fitting Mixture Models , 1999, EMMCVPR.

[52]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[53]  Gilles Celeux,et al.  A Component-Wise EM Algorithm for Mixtures , 2001, 1201.5913.

[54]  A. Lanterman Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Order Estimation , 1999 .

[55]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[56]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[57]  Anil K. Jain,et al.  Unsupervised selection and estimation of finite mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[58]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Mário A. T. Figueiredo On Gaussian radial basis function approximations: interpretation, extensions, and learning strategies , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[61]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[62]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[63]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[64]  A. Lanterman Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection , 2001 .

[65]  Helge J. Ritter,et al.  Resolution-Based Complexity Control for Gaussian Mixture Models , 2001, Neural Computation.