A tight convex upper bound on the likelihood of a finite mixture

The likelihood function of a finite mixture model is a non-convex function with multiple local maxima and commonly used iterative algorithms such as EM will converge to different solutions depending on initial conditions. In this paper we ask: is it possible to assess how far we are from the global maximum of the likelihood? Since the likelihood of a finite mixture model can grow unboundedly by centering a Gaussian on a single datapoint and shrinking the covariance, we constrain the problem by assuming that the parameters of the individual models are members of a large discrete set (e.g. estimating a mixture of two Gaussians where the means and variances of both Gaussians are members of a set of a million possible means and variances). For this setting we show that a simple upper bound on the likelihood can be computed using convex optimization and we analyze conditions under which the bound is guaranteed to be tight. This bound can then be used to assess the quality of solutions found by EM (where the final result is projected on the discrete set) or any other mixture estimation algorithm. For any dataset our method allows us to find a finite mixture model together with a dataset-specific bound on how far the likelihood of this mixture is from the global optimum of the likelihood.

[1]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[2]  Sebastian Nowozin,et al.  A decoupled approach to exemplar-based unsupervised learning , 2008, ICML '08.

[3]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[4]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[5]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[8]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[9]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[10]  Polina Golland,et al.  Convex Clustering with Exemplar-Based Models , 2007, NIPS.

[11]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[12]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[13]  Hayit Greenspan,et al.  A Continuous Probabilistic Framework for Image Matching , 2001, Comput. Vis. Image Underst..

[14]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Andrew Blake,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[16]  Yoram Singer,et al.  A Comparison of New and Old Algorithms for a Mixture Estimation Problem , 1995, COLT '95.

[17]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[18]  P. Deb Finite Mixture Models , 2008 .

[19]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[20]  Alexander M. Rush,et al.  On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing , 2010, EMNLP.