Damped Anderson Acceleration With Restarts and Monotonicity Control for Accelerating EM and EM-like Algorithms

Abstract The expectation-maximization (EM) algorithm is a well-known iterative method for computing maximum likelihood estimates in a variety of statistical problems. Despite its numerous advantages, a main drawback of the EM algorithm is its frequently observed slow convergence which often hinders the application of EM algorithms in high-dimensional problems or in other complex settings. To address the need for more rapidly convergent EM algorithms, we describe a new class of acceleration schemes that build on the Anderson acceleration technique for speeding fixed-point iterations. Our approach is effective at greatly accelerating the convergence of EM algorithms and is automatically scalable to high-dimensional settings. Through the introduction of periodic algorithm restarts and a damping factor, our acceleration scheme provides faster and more robust convergence when compared to un-modified Anderson acceleration, while also improving global convergence. Crucially, our method works as an “off-the-shelf” method in that it may be directly used to accelerate any EM algorithm without relying on the use of any model-specific features or insights. Through a series of simulation studies involving five representative problems, we show that our algorithm is substantially faster than the existing state-of-art acceleration schemes. The acceleration schemes described in this paper are implemented in the R package daarem which is available from the comprehensive R archive network (https://cran.r-project.org). Supplementary materials for this article are available online.

[1]  Duan Li,et al.  On Restart Procedures for the Conjugate Gradient Method , 2004, Numerical Algorithms.

[2]  G. Stewart,et al.  A Stable Variant of the Secant Method for Solving Nonlinear Equations , 1976 .

[3]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[4]  P. Pulay Convergence acceleration of iterative sequences. the case of scf iteration , 1980 .

[5]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[6]  Reinhold Schneider,et al.  An analysis for the DIIS acceleration method used in quantum chemistry calculations , 2011 .

[7]  R. Varadhan,et al.  Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm , 2008 .

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  Donald G. M. Anderson Iterative Procedures for Nonlinear Integral Equations , 1965, JACM.

[10]  Louis A. Hageman,et al.  Iterative Solution of Large Linear Systems. , 1971 .

[11]  D. Rubin,et al.  Parameter expansion to accelerate EM : The PX-EM algorithm , 1997 .

[12]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[13]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[14]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[15]  R. Meyer On the Convergence of Algorithms with Restart , 1976 .

[16]  David E. Tyler,et al.  A curious likelihood identity for the multivariate t-distribution , 1994 .

[17]  Nicholas J. Higham,et al.  Anderson acceleration of the alternating projections method for computing the nearest correlation matrix , 2016, Numerical Algorithms.

[18]  R. Jennrich,et al.  Conjugate Gradient Acceleration of the EM Algorithm , 1993 .

[19]  Emilio Artacho,et al.  The SIESTA method; developments and applicability , 2008, Journal of physics. Condensed matter : an Institute of Physics journal.

[20]  C. Brezinski,et al.  Extrapolation methods , 1992 .

[21]  D. Rubin,et al.  ML ESTIMATION OF THE t DISTRIBUTION USING EM AND ITS EXTENSIONS, ECM AND ECME , 1999 .

[22]  Michael G Hudgens,et al.  A flexible, computationally efficient method for fitting the proportional hazards model to interval‐censored data , 2016, Biometrics.

[23]  Yousef Saad,et al.  Two classes of multisecant methods for nonlinear acceleration , 2009, Numer. Linear Algebra Appl..

[24]  Hua Zhou,et al.  A quasi-Newton acceleration for high-dimensional optimization algorithms , 2011, Stat. Comput..

[25]  C. Geyer,et al.  Maximum likelihood for interval censored data: Consistency and computation , 1994 .

[26]  Phanish Suryanarayana,et al.  Restarted Pulay mixing for efficient and robust acceleration of fixed-point iterations , 2015 .

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[29]  V. Eyert A Comparative Study on Methods for Convergence Acceleration of Iterative Vector Sequences , 1996 .

[30]  R. Jennrich,et al.  Acceleration of the EM Algorithm by using Quasi‐Newton Methods , 1997 .

[31]  A. Sidi,et al.  Extrapolation methods for vector sequences , 1987 .

[32]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[33]  Homer F. Walker,et al.  Anderson Acceleration for Fixed-Point Iterations , 2011, SIAM J. Numer. Anal..

[34]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[35]  Jorge J. Moré,et al.  The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[36]  Yaming Yu Monotonically Overrelaxed EM Algorithms , 2012 .