On the Number of Modes of a Gaussian Mixture

We consider a problem intimately related to the creation of maxima under Gaussian blurring: the number of modes of a Gaussian mixture in D dimensions. To our knowledge, a general answer to this question is not known. We conjecture that if the components of the mixture have the same covariance matrix (or the same covariance matrix up to a scaling factor), then the number of modes cannot exceed the number of components. We demonstrate that the number of modes can exceed the number of components when the components are allowed to have arbitrary and different covariance matrices. We will review related results from scale-space theory, statistics and machine learning, including a proof of the conjecture in 1D. We present a convergent, EM-like algorithm for mode finding and compare results of searching for all modes starting from the centers of the mixture components with a brute-force search. We also discuss applications to data reconstruction and clustering.

[1]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[2]  J. Behboodian On the Modes of a Mixture of Two Normal Distributions , 1970 .

[3]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  A. Konstantellos Unimodality conditions for Gaussian sums , 1980 .

[6]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[7]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[8]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[9]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[10]  Alan L. Yuille,et al.  Scaling Theorems for Zero Crossings , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrew P. Witkin,et al.  Uniqueness of the Gaussian Kernel for Scale-Space Filtering , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  S. Wiggins Introduction to Applied Nonlinear Dynamical Systems and Chaos , 1989 .

[13]  Stephen M. Pizer,et al.  A Multiresolution Hierarchical Approach to Image Segmentation Based on Intensity Extrema , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[15]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[16]  Yiu-Fai Wong,et al.  Clustering Data by Melting , 1993, Neural Computation.

[17]  D. W. Scott,et al.  The Mode Tree: A Tool for Visualization of Nonparametric Density Features , 1993 .

[18]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[19]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  J. Damon Local Morse Theory for Solutions to the Heat Equation and Gaussian Blurring , 1995 .

[21]  Joydeep Ghosh,et al.  Scale-based clustering using the radial basis function network , 1996, IEEE Trans. Neural Networks.

[22]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[23]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[24]  Stephen J. Roberts,et al.  Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[25]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[26]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[27]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[28]  Jesse Freeman,et al.  in Morse theory, , 1999 .

[29]  Miguel Á. Carreira-Perpiñán,et al.  Reconstruction of Sequential Data with Probabilistic Models and Continuity Constraints , 1999, NIPS.

[30]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[31]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  J. Marron,et al.  SCALE SPACE VIEW OF CURVE ESTIMATION , 2000 .

[33]  Miguel Á. Carreira-Perpiñán,et al.  Mode-Finding for Mixtures of Gaussian Distributions , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[35]  Miguel Á. Carreira-Perpiñán,et al.  Continuous latent variable models for dimensionality reduction and sequential data reconstruction , 2001 .

[36]  Luc Florack,et al.  On the Behavior of Spatial Critical Points under Gaussian Blurring. A Folklore Theorem and Scale-Space Constraints , 2001, Scale-Space.

[37]  L. Florack,et al.  The application of catastrophe theory to image analysis , 2001 .

[38]  The Relevance of Non-generic Events in Scale Space Models , 2002, ECCV.

[39]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[41]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[42]  Luc Florack,et al.  The Relevance of Non-Generic Events in Scale Space Models , 2002, International Journal of Computer Vision.

[43]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .