Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in O(√n) iterations

We analyze the classical EM algorithm for parameter estimation in the symmetric two-component Gaussian mixtures in $d$ dimensions. We show that, even in the absence of any separation between components, provided that the sample size satisfies $n=\Omega(d \log^3 d)$, the randomly initialized EM algorithm converges to an estimate in at most $O(\sqrt{n})$ iterations with high probability, which is at most $O((\frac{d \log^3 n}{n})^{1/4})$ in Euclidean distance from the true parameter and within logarithmic factors of the minimax rate of $(\frac{d}{n})^{1/4}$. Both the nonparametric statistical rate and the sublinear convergence rate are direct consequences of the zero Fisher information in the worst case. Refined pointwise guarantees beyond worst-case analysis and convergence to the MLE are also shown under mild conditions. This improves the previous result of Balakrishnan et al \cite{BWY17} which requires strong conditions on both the separation of the components and the quality of the initialization, and that of Daskalakis et al \cite{DTZ17} which requires sample splitting and restarting the EM iteration.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  ScienceDirect Computational statistics & data analysis , 1983 .

[3]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[4]  J. Wellner,et al.  Empirical Processes with Applications to Statistics , 2009 .

[5]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[6]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[7]  M. Talagrand THE TRANSPORTATION COST FROM THE UNIFORM MEASURE TO THE EMPIRICAL MEASURE IN DIMENSION > 3 , 1994 .

[8]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[9]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[10]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[11]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[12]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[13]  C. Villani Topics in Optimal Transportation , 2003 .

[14]  C. Pouet Nonparametric Goodness-of-Fit Testing Under Gaussian Models , 2004 .

[15]  D. Hinkley Annals of Statistics , 2006 .

[16]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[17]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[18]  J. Kahn,et al.  Optimal rates for finite mixture estimation , 2015, 1507.04313.

[19]  J. Kahn,et al.  MINIMAX RATES FOR FINITE MIXTURE ESTIMATION , 2015, 1504.03506.

[20]  Yu Lu,et al.  Statistical and Computational Guarantees of Lloyd's Algorithm and its Variants , 2016, ArXiv.

[21]  Arian Maleki,et al.  Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.

[22]  Martin J. Wainwright,et al.  Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences , 2016, NIPS.

[23]  Kerstin Vogler,et al.  Table Of Integrals Series And Products , 2016 .

[24]  Yihong Wu,et al.  Optimal estimation of Gaussian mixtures via denoised method of moments , 2018, The Annals of Statistics.

[25]  M. Ndaoud Sharp optimal recovery in the Two Component Gaussian Mixture Model , 2018, 1812.08078.

[26]  A. Montanari,et al.  The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.

[27]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[28]  Constantine Caramanis,et al.  Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression , 2018, ArXiv.

[29]  Martin J. Wainwright,et al.  Challenges with EM in application to weakly identifiable mixture models , 2019, ArXiv.

[30]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[31]  Michael I. Jordan,et al.  Singularity, misspecification and the convergence rate of EM , 2018, The Annals of Statistics.

[32]  K. T. Poole,et al.  Draft , 2020, Definitions.