Improved Convergence Guarantees for Learning Gaussian Mixture Models by EM and Gradient EM

logK), we prove convergence of both methods to the global optima from an initialization region larger than those of previous works. Specifically, the initial guess of each component can be as far as (almost) half its distance to the nearest Gaussian. This is essentially the largest possible contraction region. Our second contribution are improved sample size requirements for accurate estimation by EM and gradient EM. In previous works, the required number of samples had a quadratic dependence on the maximal separation between the K components, and the resulting error estimate increased linearly with this maximal separation. In this manuscript we show that both quantities depend only logarithmically on the maximal separation.

[1]  Sanjeev Arora,et al.  LEARNING MIXTURES OF SEPARATED NONSPHERICAL GAUSSIANS , 2005, math/0503457.

[2]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[3]  Constantine Caramanis,et al.  The EM Algorithm gives Sample-Optimality for Learning Mixtures of Well-Separated Gaussians , 2020, COLT 2020.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[6]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[7]  Santosh S. Vempala,et al.  The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[8]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[9]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[10]  Moritz Hardt,et al.  Tight Bounds for Learning a Mixture of Two Gaussians , 2014, STOC.

[11]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[12]  Christos Tzamos,et al.  Ten Steps of EM Suffice for Mixtures of Two Gaussians , 2016, COLT.

[13]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[14]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[15]  Martin J. Wainwright,et al.  Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences , 2016, NIPS.

[16]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[17]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[18]  Aravindan Vijayaraghavan,et al.  On Learning Mixtures of Well-Separated Gaussians , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[19]  Purnamrita Sarkar,et al.  Convergence Analysis of Gradient EM for Multi-component Gaussian Mixture , 2017 .

[20]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[21]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[22]  Yuekai Sun,et al.  Statistical convergence of the EM algorithm on Gaussian mixture models , 2018, Electronic Journal of Statistics.

[23]  Arian Maleki,et al.  Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.