The EM Algorithm is Adaptively-Optimal for Unbalanced Symmetric Gaussian Mixtures

This paper studies the problem of estimating the means $\pm\theta_{*}\in\mathbb{R}^{d}$ of a symmetric two-component Gaussian mixture $\delta_{*}\cdot N(\theta_{*},I)+(1-\delta_{*})\cdot N(-\theta_{*},I)$ where the weights $\delta_{*}$ and $1-\delta_{*}$ are unequal. Assuming that $\delta_{*}$ is known, we show that the population version of the EM algorithm globally converges if the initial estimate has non-negative inner product with the mean of the larger weight component. This can be achieved by the trivial initialization $\theta_{0}=0$. For the empirical iteration based on $n$ samples, we show that when initialized at $\theta_{0}=0$, the EM algorithm adaptively achieves the minimax error rate $\tilde{O}\Big(\min\Big\{\frac{1}{(1-2\delta_{*})}\sqrt{\frac{d}{n}},\frac{1}{\|\theta_{*}\|}\sqrt{\frac{d}{n}},\left(\frac{d}{n}\right)^{1/4}\Big\}\Big)$ in no more than $O\Big(\frac{1}{\|\theta_{*}\|(1-2\delta_{*})}\Big)$ iterations (with high probability). We also consider the EM iteration for estimating the weight $\delta_{*}$, assuming a fixed mean $\theta$ (which is possibly mismatched to $\theta_{*}$). For the empirical iteration of $n$ samples, we show that the minimax error rate $\tilde{O}\Big(\frac{1}{\|\theta_{*}\|}\sqrt{\frac{d}{n}}\Big)$ is achieved in no more than $O\Big(\frac{1}{\|\theta_{*}\|^{2}}\Big)$ iterations. These results robustify and complement recent results of Wu and Zhou obtained for the equal weights case $\delta_{*}=1/2$.

[1]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[2]  Harrison H. Zhou,et al.  Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in O(√n) iterations , 2019, Mathematical Statistics and Learning.

[3]  Martin J. Wainwright,et al.  Challenges with EM in application to weakly identifiable mixture models , 2019, ArXiv.

[4]  Constantine Caramanis,et al.  Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression , 2018, ArXiv.

[5]  Yuekai Sun,et al.  Statistical convergence of the EM algorithm on Gaussian mixture models , 2018, Electronic Journal of Statistics.

[6]  Michael I. Jordan,et al.  Singularity, misspecification and the convergence rate of EM , 2018, The Annals of Statistics.

[7]  Yihong Wu,et al.  Optimal estimation of Gaussian mixtures via denoised method of moments , 2018, The Annals of Statistics.

[8]  Jason M. Klusowski,et al.  Estimating the Coefficients of a Mixture of Two Linear Regressions by Expectation Maximization , 2017, IEEE Transactions on Information Theory.

[9]  Can Yang,et al.  On the Convergence of the EM Algorithm: A Data-Adaptive Analysis , 2016, 1611.00519.

[10]  Christos Tzamos,et al.  Ten Steps of EM Suffice for Mixtures of Two Gaussians , 2016, COLT.

[11]  Jason M. Klusowski,et al.  Statistical Guarantees for Estimating the Centers of a Two-component Gaussian Mixture by EM , 2016, 1608.02280.

[12]  Arian Maleki,et al.  Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.

[13]  Constantine Caramanis,et al.  Regularized EM Algorithms: A Unified Framework and Statistical Guarantees , 2015, NIPS.

[14]  Martin J. Wainwright,et al.  Statistical and computational guarantees for the Baum-Welch algorithm , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  J. Kahn,et al.  Optimal rates for finite mixture estimation , 2015, 1507.04313.

[16]  Zhaoran Wang,et al.  High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality , 2014, 1412.8729.

[17]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[18]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[19]  S. Boucheron,et al.  Concentration Inequalities: A Nonasymptotic Theory of Independence , 2013 .

[20]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[21]  Maya R. Gupta,et al.  Theory and Use of the EM Algorithm , 2011, Found. Trends Signal Process..

[22]  Alfred O. Hero,et al.  On EM algorithms and their proximal generalizations , 2008, 1201.5912.

[23]  C. Pouet Nonparametric Goodness-of-Fit Testing Under Gaussian Models , 2004 .

[24]  C. Villani Topics in Optimal Transportation , 2003 .

[25]  Dimitris Karlis,et al.  Choosing Initial Values for the EM Algorithm for Finite Mixtures , 2003, Comput. Stat. Data Anal..

[26]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[27]  Xiao-Li Meng,et al.  On the global and componentwise rates of convergence of the EM algorithm , 1994 .

[28]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[29]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[30]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[31]  E. Beale,et al.  Missing Values in Multivariate Analysis , 1975 .

[32]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[33]  V. Hasselblad Finite mixtures of distributions from the exponential family , 1969 .

[34]  V. Hasselblad Estimation of parameters for a mixture of normal distributions , 1966 .

[35]  M. Healy,et al.  Missing Values in Experiments Analysed on Automatic Computers , 1956 .

[36]  Martin J. Wainwright,et al.  Theoretical guarantees for EM under misspecified Gaussian mixture models , 2018, NeurIPS.

[37]  Dhroova Aiylam Parameter estimation in HMMs with guaranteed convergence , 2018 .

[38]  Purnamrita Sarkar,et al.  Convergence of Gradient EM on Multi-component Mixture of Gaussians , 2017, NIPS.

[39]  R. Sundberg Maximum Likelihood Theory for Incomplete Data from an Exponential Family , 2016 .

[40]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[41]  Jeffrey A. Fessler,et al.  Convergence in Norm for Alternating Expectation-Maximization (EM) Type Algorithms , 1995 .

[42]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[43]  Ingram Olkin,et al.  Inequalities: Theory of Majorization and Its Application , 1979 .

[44]  M. Woodbury A missing information principle: theory and applications , 1972 .