论文信息 - A tight convex upper bound on the likelihood of a finite mixture

A tight convex upper bound on the likelihood of a finite mixture

The likelihood function of a finite mixture model is a non-convex function with multiple local maxima and commonly used iterative algorithms such as EM will converge to different solutions depending on initial conditions. In this paper we ask: is it possible to assess how far we are from the global maximum of the likelihood? Since the likelihood of a finite mixture model can grow unboundedly by centering a Gaussian on a single datapoint and shrinking the covariance, we constrain the problem by assuming that the parameters of the individual models are members of a large discrete set (e.g. estimating a mixture of two Gaussians where the means and variances of both Gaussians are members of a set of a million possible means and variances). For this setting we show that a simple upper bound on the likelihood can be computed using convex optimization and we analyze conditions under which the bound is guaranteed to be tight. This bound can then be used to assess the quality of solutions found by EM (where the final result is projected on the discrete set) or any other mixture estimation algorithm. For any dataset our method allows us to find a finite mixture model together with a dataset-specific bound on how far the likelihood of this mixture is from the global optimum of the likelihood.

Yair Weiss | Elad Mezuman

[1] S. Kay. Fundamentals of statistical signal processing: estimation theory , 1993 .

[2] Sebastian Nowozin,et al. A decoupled approach to exemplar-based unsupervised learning , 2008, ICML '08.

[3] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[4] Martin J. Wainwright,et al. Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[5] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7] K. Pearson. Contributions to the Mathematical Theory of Evolution , 1894 .

[8] Vladimir Kolmogorov,et al. "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[9] Sham M. Kakade,et al. Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[10] Polina Golland,et al. Convex Clustering with Exemplar-Based Models , 2007, NIPS.

[11] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.