A Legacy of EM Algorithms

Nan Laird has an enormous and growing impact on computational statistics. Her paper with Dempster and Rubin on the expectation‐maximisation (EM) algorithm is the second most cited paper in statistics. Her papers and book on longitudinal modelling are nearly as impressive. In this brief survey, we revisit the derivation of some of her most useful algorithms from the perspective of the minorisation‐maximisation (MM) principle. The MM principle generalises the EM principle and frees it from the shackles of missing data and conditional expectations. Instead, the focus shifts to the construction of surrogate functions via standard mathematical inequalities. The MM principle can deliver a classical EM algorithm with less fuss or an entirely new algorithm with a faster rate of convergence. In any case, the MM principle enriches our understanding of the EM principle and suggests new algorithms of considerable potential in high‐dimensional settings where standard algorithms such as Newton's method and Fisher scoring falter.

[1]  Joong-Ho Won,et al.  Nonconvex Optimization via MM Algorithms: Convergence Theory , 2021, 2106.02805.

[2]  G. Skačej,et al.  Mixtures , 2020, The Cosmic Microwave Background.

[3]  Oscar Hernan Madrid Padilla,et al.  Extensions to the Proximal Distance Method of Constrained Optimization , 2020, J. Mach. Learn. Res..

[4]  Luis Carvalho,et al.  An expectation-maximization algorithm for the matrix normal distribution with an application in remote sensing , 2018, J. Multivar. Anal..

[5]  Jason Xu,et al.  Generalized Linear Model Regression under Distance-to-set Penalties , 2017, NIPS.

[6]  Kenneth Lange,et al.  MM optimization algorithms , 2016 .

[7]  Hua Zhou,et al.  Proximal Distance Algorithms: Theory and Practice , 2016, J. Mach. Learn. Res..

[8]  Jin Zhou,et al.  MM Algorithms for Variance Components Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[9]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[10]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[11]  Hua Zhou,et al.  Distance majorization and its applications , 2012, Mathematical Programming.

[12]  K. Lange,et al.  The MM Alternative to EM , 2010, 1104.2203.

[13]  Kenneth Lange,et al.  Sharp quadratic majorization in one dimension , 2009, Comput. Stat. Data Anal..

[14]  G. Anderson,et al.  Generalized convexity and inequalities , 2007, math/0701262.

[15]  G. Molenberghs Applied Longitudinal Analysis , 2005 .

[16]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[17]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[18]  Douglas M. Bates,et al.  Unconstrained parametrizations for variance-covariance matrices , 1996, Stat. Comput..

[19]  A. R. De Pierro,et al.  On the relation between the ISRA and the EM algorithm for positron emission tomography , 1993, IEEE Trans. Medical Imaging.

[20]  K. Lange,et al.  Normal/Independent Distributions and Their Applications in Robust Regression , 1993 .

[21]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[22]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[23]  Gregory C. Reinsel,et al.  Estimation and Prediction in a Multivariate Random Effects Generalized Linear Model , 1984 .

[24]  Ulrich Eckhardt,et al.  Linear convergence of generalized Weiszfeld's method , 1980, Computing.

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  M. Kupperman Linear Statistical Inference and Its Applications 2nd Edition (C. Radhakrishna Rao) , 1975 .

[27]  J. Leeuw Applications of Convex Analysis to Multidimensional Scaling , 2000 .

[28]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[29]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[30]  David E. Tyler,et al.  A curious likelihood identity for the multivariate t-distribution , 1994 .

[31]  C. Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications, Second Editon , 1973, Wiley Series in Probability and Statistics.