A Tutorial on MM Algorithms

Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the log-likelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to be part of the standard toolkit of professional statisticians. This article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition to surveying previous work on MM algorithms, this article introduces some new material on constrained optimization and standard error estimation.

[1]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[2]  E. Schlossmacher An Iterative Technique for Absolute Deviations Curve Fitting , 1973 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Ingram Olkin,et al.  Inequalities: Theory of Majorization and Its Application , 1979 .

[5]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[6]  M. Maher Modelling association football scores , 1982 .

[7]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[8]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[9]  Willem S. Heiser Correspondence analysis with least absolute residuals , 1987 .

[10]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[11]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[12]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[13]  J. Leeuw,et al.  Fitting longitudinal reduced-rank regression models by alternating least squares , 1991 .

[14]  Y. Censor,et al.  Proximal minimization algorithm withD-functions , 1992 .

[15]  J. Berge,et al.  Minimization of a class of matrix trace functions by means of refined majorization , 1992 .

[16]  Patrick J. F. Groenen,et al.  The majorization approach to multidimensional scaling : some problems and extensions , 1993 .

[17]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[18]  K. Lange,et al.  Normal/Independent Distributions and Their Applications in Robust Regression , 1993 .

[19]  James V. Bondar Inequalities: Theory of majorization and its applications: by Albert W. Marshall and Ingram Olkin , 1994 .

[20]  R. D. Murphy,et al.  Iterative solution of nonlinear equations , 1994 .

[21]  Jan de Leeuw,et al.  Block-relaxation Algorithms in Statistics , 1994 .

[22]  K. Lange An adaptive barrier method for convex programming , 1994 .

[23]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[24]  Jeffrey A. Fessler,et al.  Ieee Transactions on Image Processing: to Appear Globally Convergent Algorithms for Maximum a Posteriori Transmission Tomography , 2022 .

[25]  Alvaro R. De Pierro,et al.  A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography , 1995, IEEE Trans. Medical Imaging.

[26]  D. A. Wolf Recent advances in descriptive multivariate analysis , 1996 .

[27]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[28]  K. Lange,et al.  EM algorithms without missing data , 1997, Statistical methods in medical research.

[29]  R. Jennrich,et al.  Acceleration of the EM Algorithm by using Quasi‐Newton Methods , 1997 .

[30]  D. Oakes Direct calculation of the information matrix via the EM , 1999 .

[31]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[32]  Xiao-Li Meng,et al.  [Optimization Transfer Using Surrogate Objective Functions]: Discussion , 2000 .

[33]  D. Hunter,et al.  Quantile Regression via an MM Algorithm , 2000 .

[34]  Kenneth Lange,et al.  [Optimization Transfer Using Surrogate Objective Functions]: Rejoinder , 2000 .

[35]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[36]  D. Hunter,et al.  Computing Estimates in the Proportional Odds Model , 2002 .

[37]  D. Hunter,et al.  A Connection Between Variable Selection and EM-Type Algorithms , 2002 .

[38]  Kenneth Lange,et al.  Genomewide motif identification using a dictionary model , 2002 .

[39]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[40]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.