The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune

Celebrating the 20th anniversary of the presentation of the paper by Dempster, Laird and Rubin which popularized the EM algorithm, we investigate, after a brief historical account, strategies that aim to make the EM algorithm converge faster while maintaining its simplicity and stability (e.g. automatic monotone convergence in likelihood). First we introduce the idea of a ‘working parameter’ to facilitate the search for efficient data augmentation schemes and thus fast EM implementations. Second, summarizing various recent extensions of the EM algorithm, we formulate a general alternating expectation–conditional maximization algorithm AECM that couples flexible data augmentation schemes with model reduction schemes to achieve efficient computations. We illustrate these methods using multivariate t-models with known or unknown degrees of freedom and Poisson models for image reconstruction. We show, through both empirical and theoretical evidence, the potential for a dramatic reduction in computational time with little increase in human effort. We also discuss the intrinsic connection between EM-type algorithms and the Gibbs sampler, and the possibility of using the techniques presented here to speed up the latter. The main conclusion of the paper is that, with the help of statistical considerations, it is possible to construct algorithms that are simple, stable and fast.

[1]  A. M'Kendrick Applications of Mathematics to Medical Problems , 1925, Proceedings of the Edinburgh Mathematical Society.

[2]  D. Rubin,et al.  ML ESTIMATION OF THE t DISTRIBUTION USING EM AND ITS EXTENSIONS, ECM AND ECME , 1999 .

[3]  G. Celeux,et al.  Stochastic versions of the em algorithm: an experimental study in the mixture case , 1996 .

[4]  Paul. Damien,et al.  Sampling nonstandard distributions via the Gibbs sampler , 1996 .

[5]  Eric Moulines,et al.  A simulated annealing version of the EM algorithm for non-Gaussian deconvolution , 1997, Stat. Comput..

[6]  Charles L. Byrne,et al.  Block-iterative methods for image reconstruction from projections , 1996, IEEE Trans. Image Process..

[7]  William J. Byrne,et al.  Alternating minimization and Boltzmann machine learning , 1992, IEEE Trans. Neural Networks.

[8]  Fitting redescending M-estimators in regression , 1990 .

[9]  Jun Zhang,et al.  The mean field theory in EM procedures for blind Markov random field image restoration , 1993, IEEE Trans. Image Process..

[10]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[11]  L. Shepp,et al.  A Statistical Model for Positron Emission Tomography , 1985 .

[12]  H. Malcolm Hudson,et al.  Accelerated image reconstruction using ordered subsets of projection data , 1994, IEEE Trans. Medical Imaging.

[13]  J. D. Wilson,et al.  A smoothed EM approach to indirect estimation problems, with particular reference to stereology and emission tomography , 1990 .

[14]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[15]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[16]  Jun S. Liu Peskun's theorem and a modified discrete-state Gibbs sampler , 1996 .

[17]  Jeffrey A. Fessler,et al.  Convergence in Norm for Alternating Expectation-Maximization (EM) Type Algorithms , 1995 .

[18]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[19]  Alfred O. Hero,et al.  Ieee Transactions on Image Processing: to Appear Penalized Maximum-likelihood Image Reconstruction Using Space-alternating Generalized Em Algorithms , 2022 .

[20]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[21]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[22]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[23]  Kerrie Mengersen,et al.  [Bayesian Computation and Stochastic Systems]: Rejoinder , 1995 .

[24]  D. Titterington,et al.  A comparison of iterative methods for obtaining maximum likelihood estimates in contingency tables with a missing diagonal , 1977 .

[25]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[26]  S. Walker,et al.  A full Bayesian analysis of circular data using the von Mises distribution , 1999 .

[27]  Jun Zhang The mean field theory in EM procedures for Markov random fields , 1992, IEEE Trans. Signal Process..

[28]  J. Edwards Biomathematics , 1972 .

[29]  G. Celeux,et al.  Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions , 1993 .

[30]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[31]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[32]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[33]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[34]  Goodman,et al.  Multigrid Monte Carlo method. Conceptual foundations. , 1989, Physical review. D, Particles and fields.

[35]  C. Robert,et al.  Estimation of a normal mixture model through Gibbs sampling and Prior Feedback , 1993 .

[36]  L. Shepp,et al.  Maximum Likelihood Reconstruction for Emission Tomography , 1983, IEEE Transactions on Medical Imaging.

[37]  Rory A. Fisher,et al.  Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[38]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[39]  Y. Vardi,et al.  From image deblurring to optimal investments : maximum likelihood solutions for positive linear inverse problems , 1993 .

[40]  W. Schull,et al.  A note on the estimation of the ABO gene frequencies and the coefficient of inbreeding. , 1969, American journal of human genetics.

[41]  W. Gilks,et al.  Adaptive Rejection Metropolis Sampling Within Gibbs Sampling , 1995 .

[42]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[43]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[44]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[45]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[46]  J. Diebolt,et al.  A Stochastic EM algorithm for approximating the maximum likelihood estimate , 1995 .

[47]  G. Roberts,et al.  Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler , 1997 .

[48]  Bernard Delyon,et al.  Accelerated Stochastic Approximation , 1993, SIAM J. Optim..

[49]  R. Jennrich,et al.  Conjugate Gradient Acceleration of the EM Algorithm , 1993 .

[50]  David M. Rocke,et al.  Computable Robust Estimation of Multivariate Location and Shape in High Dimension Using Compound Estimators , 1994 .

[51]  M. Woodbury A missing information principle: theory and applications , 1972 .

[52]  Shun-ichi Amari,et al.  Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[53]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[54]  J. Dupuis Bayesian estimation of movement and survival probabilities from capture-recapture data , 1995 .

[55]  Jeffrey A. Fessler,et al.  Grouped-coordinate ascent algorithms for penalized-likelihood transmission image reconstruction , 1997, IEEE Transactions on Medical Imaging.

[56]  S. Stigler Citation Patterns in the Journals of Statistics and Probability , 1994 .

[57]  Xiao-Li Meng,et al.  On the global and componentwise rates of convergence of the EM algorithm , 1994 .

[58]  C. A. Smith,et al.  Estimating linkage heterogeneity , 1996, Annals of human genetics.

[59]  J. Hinde,et al.  Random effects in generalized linear models and the em algoritham , 1988 .

[60]  S. Mitter,et al.  Metropolis-type annealing algorithms for global optimization in R d , 1993 .

[61]  Jean Claude Biscarat Almost sure convergence of a class of stochastic algorithms , 1994 .

[62]  Paul. Damien,et al.  Sampling probability densities via uniform random variables and a Gibbs sampler , 1996 .

[63]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[64]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[65]  J. Besag,et al.  Spatial Statistics and Bayesian Computation , 1993 .

[66]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[67]  David E. Tyler,et al.  A curious likelihood identity for the multivariate t-distribution , 1994 .

[68]  R. Sundberg An iterative method for solution of the likelihood equations for incomplete data from exponential families , 1976 .

[69]  C. Robert,et al.  Bayesian estimation of hidden Markov chains: a stochastic implementation , 1993 .

[70]  C. A. Smith Counting methods in genetical statistics. , 1957, Annals of human genetics.

[71]  W. Qian,et al.  Estimation of parameters in hidden Markov models , 1991, Philosophical Transactions of the Royal Society of London. Series A: Physical and Engineering Sciences.

[72]  D. Ruppert Computing S Estimators for Regression and Multivariate Location/Dispersion , 1992 .

[73]  Jeffrey A. Fessler,et al.  Spatial resolution properties of penalized-likelihood image reconstruction: space-invariant tomographs , 1996, IEEE Trans. Image Process..

[74]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[75]  Xiao-Li Meng,et al.  On the rate of convergence of the ECM algorithm , 1994 .

[76]  S. Silvey,et al.  An algorithm for optimal designs on a design space , 1978 .

[77]  K. Lange,et al.  EM reconstruction algorithms for emission and transmission tomography. , 1984, Journal of computer assisted tomography.

[78]  C. Heyde,et al.  Quasi‐Likelihood and Generalizing the Em Algorithm , 1996 .

[79]  J. Kent,et al.  Convergence Behavior of the em algorithm for the multivariate t -distribution , 1995 .

[80]  Adrian F. M. Smith,et al.  Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[81]  C. McCulloch Maximum Likelihood Variance Components Estimation for Binary Data , 1994 .

[82]  D. M. Titterington,et al.  Beyond the binary Boltzmann machine , 1995, IEEE Trans. Neural Networks.

[83]  R. Sundberg Maximum likelihood theory and applications for distributions generated when observing a function of an exponential family variable , 1972 .

[84]  H. P. Lopuhaä On the relation between S-estimators and M-estimators of multivariate location and covariance , 1989 .

[85]  C. A. Smith,et al.  THE ESTIMATION OF GENE FREQUENCIES IN A RANDOM‐MATING POPULATION , 1955, Annals of human genetics.

[86]  Charles L. Byrne,et al.  Iterative image reconstruction algorithms based on cross-entropy minimization , 1993, IEEE Trans. Image Process..

[87]  David E. Tyler,et al.  Redescending $M$-Estimates of Multivariate Location and Scatter , 1991 .

[88]  Xiao-Li Meng,et al.  Fitting Full-Information Item Factor Models and an Empirical Investigation of Bridge Sampling , 1996 .

[89]  George Casella,et al.  Improving the EM Algorithm , 1992 .

[90]  Alfred O. Hero,et al.  Space-alternating generalized expectation-maximization algorithm , 1994, IEEE Trans. Signal Process..

[91]  P. Peskun,et al.  Optimum Monte-Carlo sampling using Markov chains , 1973 .

[92]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[93]  D. Rubin,et al.  Parameter expansion to accelerate EM: The PX-EM algorithm , 1998 .

[94]  D. Chauveau A stochastic EM algorithm for mixtures with censored data , 1995 .

[95]  I. Meilijson A fast improvement to the EM algorithm on its own terms , 1989 .

[96]  from projections. ” IEEE Trans Nucl Sci, 1976; NS-23: 1428-1432 [2] A.P. Dempster, N.M. Laird, D.B. Rubin “Maximum likelihood from incomplete data via , 1991 .

[97]  Xiao-Li Meng,et al.  On the Orderings and Groupings of Conditional Maximizations Within ECM-Type Algorithms , 1997 .

[98]  E. Beale,et al.  Missing Values in Multivariate Analysis , 1975 .

[99]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[100]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[101]  E A Thompson,et al.  Monte Carlo estimation of mixed models for large complex pedigrees. , 1994, Biometrics.

[102]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[103]  S. Weisberg,et al.  Assessing influence in multiple linear regression with incomplete data , 1986 .

[104]  David E. Tyler,et al.  Constrained M-estimation for multivariate location and scatter , 1996 .

[105]  Xiao-Li Meng,et al.  Fast EM‐type implementations for mixed effects models , 1998 .

[106]  A. Hero,et al.  SPACE-ALTERNATING GENERALIZED EM ALGORITHMS FOR PENALIZED MAXIMUM-LIKELIHOOD IMAGE RECONSTRUCTION , 1997 .

[107]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[108]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[109]  Jeffrey A. Fessler,et al.  On complete-data spaces for PET reconstruction algorithms , 1993 .

[110]  X L Meng,et al.  The EM algorithm and medical studies: a historical linik , 1997, Statistical methods in medical research.

[111]  B. Torsney A Moment Inequality and Monotonicity of an Algorithm , 1983 .

[112]  G. Celeux,et al.  A stochastic approximation type EM algorithm for the mixture problem , 1992 .

[113]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[114]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[115]  U. Grenander,et al.  Comparing sweep strategies for stochastic relaxation , 1991 .

[116]  H. A. Luther,et al.  Applied numerical methods , 1969 .

[117]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .