Theoretical guarantees for EM under misspecified Gaussian mixture models

Recent years have witnessed substantial progress in understanding the behavior of EM for mixture models that are correctly specified. Given that model misspecification is common in practice, it is important to understand EM in this more general setting. We provide non-asymptotic guarantees for population and sample-based EM for parameter estimation under a few specific univariate settings of misspecified Gaussian mixture models. Due to misspecification, the EM iterates no longer converge to the true model and instead converge to the projection of the true model over the set of models being searched over. We provide two classes of theoretical guarantees: first, we characterize the bias introduced due to the misspecification; and second, we prove that population EM converges at a geometric rate to the model projection under a suitable initialization condition. This geometric convergence rate for population EM imply a statistical complexity of order $1/\sqrt{n}$ when running EM with $n$ samples. We validate our theoretical findings in different cases via several numerical examples.

[1]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[2]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[3]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[4]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[5]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[6]  Christos Tzamos,et al.  Ten Steps of EM Suffice for Mixtures of Two Gaussians , 2016, COLT.

[7]  Purnamrita Sarkar,et al.  Convergence of Gradient EM on Multi-component Mixture of Gaussians , 2017, NIPS.

[8]  Jinwen Ma,et al.  Asymptotic Convergence Rate of the EM Algorithm for Gaussian Mixtures , 2000, Neural Computation.

[9]  Sylvie Huet,et al.  Gaussian model selection with an unknown variance , 2007, math/0701250.

[10]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[11]  Constantine Caramanis,et al.  Regularized EM Algorithms: A Unified Framework and Statistical Guarantees , 2015, NIPS.

[12]  Guang Cheng,et al.  Simultaneous Clustering and Estimation of Heterogeneous Graphical Models , 2016, J. Mach. Learn. Res..

[13]  C. Villani Optimal Transport: Old and New , 2008 .

[14]  H. Teicher Identifiability of Finite Mixtures , 1963 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Zhaoran Wang,et al.  High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality , 2014, 1412.8729.

[17]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[18]  Jing Ma,et al.  CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality , 2019, The Annals of Statistics.

[19]  Arian Maleki,et al.  Global Analysis of Expectation Maximization for Mixtures of Two Gaussians , 2016, NIPS.

[20]  J. Kahn,et al.  Strong identifiability and optimal minimax rates for finite mixture estimation , 2018, The Annals of Statistics.

[21]  Martin J. Wainwright,et al.  Local Maxima in the Likelihood of Gaussian Mixture Models: Structural Results and Algorithmic Consequences , 2016, NIPS.