On the Convergence of the EM Algorithm: A Data-Adaptive Analysis

The Expectation-Maximization (EM) algorithm is an iterative method to maximize the log-likelihood function for parameter estimation. Previous works on the convergence analysis of the EM algorithm have established results on the asymptotic (population level) convergence rate of the algorithm. In this paper, we give a data-adaptive analysis of the sample level local convergence rate of the EM algorithm. In particular, we show that the local convergence rate of the EM algorithm is a random variable $\overline{K}_{n}$ derived from the data generating distribution, which adaptively yields the convergence rate of the EM algorithm on each finite sample data set from the same population distribution. We then give a non-asymptotic concentration bound of $\overline{K}_{n}$ on the population level optimal convergence rate $\overline{\kappa}$ of the EM algorithm, which implies that $\overline{K}_{n}\to\overline{\kappa}$ in probability as the sample size $n\to\infty$. Our theory identifies the effect of sample size on the convergence behavior of sample EM sequence, and explains a surprising phenomenon in applications of the EM algorithm, i.e. the finite sample version of the algorithm sometimes converges faster even than the population version. We apply our theory to the EM algorithm on three canonical models and obtain specific forms of the adaptive convergence theorem for each model.

[1]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[2]  Attila Gilányi,et al.  An Introduction to the Theory of Functional Equations and Inequalities , 2008 .

[3]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[5]  M. Woodbury A missing information principle: theory and applications , 1972 .

[6]  Donald B. Rubin,et al.  Characterizing the Estimation of Parameters in Incomplete-Data Problems , 1974 .

[7]  Constantine Caramanis,et al.  Regularized EM Algorithms: A Unified Framework and Statistical Guarantees , 2015, NIPS.

[8]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[9]  V. V. Petrov Sums of Independent Random Variables , 1975 .

[10]  V. V. Buldygin,et al.  Sub-Gaussian random variables , 1980 .

[11]  R. Sundberg Maximum Likelihood Theory for Incomplete Data from an Exponential Family , 2016 .

[12]  M. Ledoux The concentration of measure phenomenon , 2001 .

[13]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[14]  Alfred O. Hero,et al.  On EM algorithms and their proximal generalizations , 2008, 1201.5912.

[15]  R. Sundberg An iterative method for solution of the likelihood equations for incomplete data from exponential families , 1976 .

[16]  Xiao-Li Meng,et al.  On the rate of convergence of the ECM algorithm , 1994 .

[17]  R. Sundberg Maximum likelihood theory and applications for distributions generated when observing a function of an exponential family variable , 1972 .

[18]  M. Talagrand A new look at independence , 1996 .

[19]  V. Milman,et al.  Concentration Property on Probability Spaces , 2000 .

[20]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[21]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[22]  Xiao-Li Meng,et al.  On the global and componentwise rates of convergence of the EM algorithm , 1994 .

[23]  G. Pisier The volume of convex bodies and Banach space geometry , 1989 .

[24]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[25]  R. M. Dudley,et al.  Real Analysis and Probability: Axiomatic Set Theory , 2002 .

[26]  R. A. Boyles On the Convergence of the EM Algorithm , 1983 .

[27]  D. Pollard Convergence of stochastic processes , 1984 .

[28]  D. Conniffe Expected Maximum Log Likelihood Estimation , 1987 .

[29]  T. Tao Topics in Random Matrix Theory , 2012 .

[30]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[31]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[32]  Zhaoran Wang,et al.  High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality , 2015, NIPS.

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[35]  M. Talagrand THE SUPREMUM OF SOME CANONICAL PROCESSES , 1994 .

[36]  Paul Tseng,et al.  An Analysis of the EM Algorithm and Entropy-Like Proximal Point Methods , 2004, Math. Oper. Res..

[37]  P. Spreij Probability and Measure , 1996 .