Efficiently learning mixtures of two Gaussians

Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We provide a polynomial-time algorithm for this problem for the case of two Gaussians in $n$ dimensions (even if they overlap), with provably minimal assumptions on the Gaussians, and polynomial data requirements. In statistical terms, our estimator converges at an inverse polynomial rate, and no such estimator (even exponential time) was known for this problem (even in one dimension). Our algorithm reduces the n-dimensional problem to the one-dimensional problem, where the method of moments is applied. One technical challenge is proving that noisy estimates of the first six moments of a univariate mixture suffice to recover accurate estimates of the mixture parameters, as conjectured by Pearson (1894), and in fact these estimates converge at an inverse polynomial rate. As a corollary, we can efficiently perform near-optimal clustering: in the case where the overlap between the Gaussians is small, one can accurately cluster the data, and when the Gaussians have partial overlap, one can still accurately cluster those data points which are not in the overlap region. A second consequence is a polynomial-time density estimation algorithm for arbitrary mixtures of two Gaussians, generalizing previous work on axis-aligned Gaussians (Feldman {\em et al}, 2006).

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  A. Dinghas Über eine Klasse superadditiver Mengenfunktionale von Brunn-Minkowski-Lusternikschem Typus , 1957 .

[3]  L. Leindler On a Certain Converse of Hölder’s Inequality , 1972 .

[4]  A. Prékopa On logarithmic concave measures and functions , 1973 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  G. M. Tallis,et al.  Identifiability of mixtures , 1982, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[7]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[8]  Gene H. Golub,et al.  Matrix computations , 1983 .

[9]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[10]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[11]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[12]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[13]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[14]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[15]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[16]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[17]  Sanjoy Dasgupta,et al.  A Two-Round Variant of EM for Gaussian Mixtures , 2000, UAI.

[18]  V. Milman,et al.  Concentration Property on Probability Spaces , 2000 .

[19]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[20]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[21]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[22]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[23]  Santosh S. Vempala,et al.  The Spectral Method for Mixture Models , 2004, Electron. Colloquium Comput. Complex..

[24]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[25]  Jon Feldman,et al.  PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation Assumption , 2006, COLT.

[26]  Santosh S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[27]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[28]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[29]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[30]  Mikhail Belkin,et al.  Learning Gaussian Mixtures with Arbitrary Separation , 2009, ArXiv.