论文信息 - The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune

The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune

Celebrating the 20th anniversary of the presentation of the paper by Dempster, Laird and Rubin which popularized the EM algorithm, we investigate, after a brief historical account, strategies that aim to make the EM algorithm converge faster while maintaining its simplicity and stability (e.g. automatic monotone convergence in likelihood). First we introduce the idea of a ‘working parameter’ to facilitate the search for efficient data augmentation schemes and thus fast EM implementations. Second, summarizing various recent extensions of the EM algorithm, we formulate a general alternating expectation–conditional maximization algorithm AECM that couples flexible data augmentation schemes with model reduction schemes to achieve efficient computations. We illustrate these methods using multivariate t-models with known or unknown degrees of freedom and Poisson models for image reconstruction. We show, through both empirical and theoretical evidence, the potential for a dramatic reduction in computational time with little increase in human effort. We also discuss the intrinsic connection between EM-type algorithms and the Gibbs sampler, and the possibility of using the techniques presented here to speed up the latter. The main conclusion of the paper is that, with the help of statistical considerations, it is possible to construct algorithms that are simple, stable and fast.

[1] A. M'Kendrick. Applications of Mathematics to Medical Problems , 1925, Proceedings of the Edinburgh Mathematical Society.

[2] D. Rubin,et al. ML ESTIMATION OF THE t DISTRIBUTION USING EM AND ITS EXTENSIONS, ECM AND ECME , 1999 .

[3] G. Celeux,et al. Stochastic versions of the em algorithm: an experimental study in the mixture case , 1996 .

[4] Paul. Damien,et al. Sampling nonstandard distributions via the Gibbs sampler , 1996 .

[5] Eric Moulines,et al. A simulated annealing version of the EM algorithm for non-Gaussian deconvolution , 1997, Stat. Comput..

[6] Charles L. Byrne,et al. Block-iterative methods for image reconstruction from projections , 1996, IEEE Trans. Image Process..

[7] William J. Byrne,et al. Alternating minimization and Boltzmann machine learning , 1992, IEEE Trans. Neural Networks.

[8] Fitting redescending M-estimators in regression , 1990 .

[9] Jun Zhang,et al. The mean field theory in EM procedures for blind Markov random field image restoration , 1993, IEEE Trans. Image Process..

[10] Michael I. Jordan,et al. On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[11] L. Shepp,et al. A Statistical Model for Positron Emission Tomography , 1985 .

[12] H. Malcolm Hudson,et al. Accelerated image reconstruction using ordered subsets of projection data , 1994, IEEE Trans. Medical Imaging.

[13] J. D. Wilson,et al. A smoothed EM approach to indirect estimation problems, with particular reference to stereology and emission tomography , 1990 .

[14] C. Geyer,et al. Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[15] Xiao-Li Meng,et al. Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[16] Jun S. Liu. Peskun's theorem and a modified discrete-state Gibbs sampler , 1996 .

[17] Jeffrey A. Fessler,et al. Convergence in Norm for Alternating Expectation-Maximization (EM) Type Algorithms , 1995 .

[18] P. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[19] Alfred O. Hero,et al. Ieee Transactions on Image Processing: to Appear Penalized Maximum-likelihood Image Reconstruction Using Space-alternating Generalized Em Algorithms , 2022 .

[20] D. Rubin,et al. Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[21] R. Maronna. Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[22] R. Tweedie,et al. Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[23] Kerrie Mengersen,et al. [Bayesian Computation and Stochastic Systems]: Rejoinder , 1995 .

[24] D. Titterington,et al. A comparison of iterative methods for obtaining maximum likelihood estimates in contingency tables with a missing diagonal , 1977 .

[25] G. McLachlan,et al. The EM algorithm and extensions , 1996 .

[26] S. Walker,et al. A full Bayesian analysis of circular data using the von Mises distribution , 1999 .

[27] Jun Zhang. The mean field theory in EM procedures for Markov random fields , 1992, IEEE Trans. Signal Process..

[28] J. Edwards. Biomathematics , 1972 .

[29] G. Celeux,et al. Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions , 1993 .

[30] D. Rubin,et al. The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[31] T. Louis. Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[32] Jun S. Liu,et al. Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[33] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[34] Goodman,et al. Multigrid Monte Carlo method. Conceptual foundations. , 1989, Physical review. D, Particles and fields.

[35] C. Robert,et al. Estimation of a normal mixture model through Gibbs sampling and Prior Feedback , 1993 .

[36] L. Shepp,et al. Maximum Likelihood Reconstruction for Emission Tomography , 1983, IEEE Transactions on Medical Imaging.

[37] Rory A. Fisher,et al. Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[38] W. Wong,et al. The calculation of posterior distributions by data augmentation , 1987 .

[39] Y. Vardi,et al. From image deblurring to optimal investments : maximum likelihood solutions for positive linear inverse problems , 1993 .

[40] W. Schull,et al. A note on the estimation of the ABO gene frequencies and the coefficient of inbreeding. , 1969, American journal of human genetics.

[41] W. Gilks,et al. Adaptive Rejection Metropolis Sampling Within Gibbs Sampling , 1995 .

[42] Radford M. Neal. A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[43] David B. Dunson,et al. Bayesian Data Analysis , 2010 .

[44] Xiao-Li Meng,et al. POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[45] R. D. Bock,et al. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[46] J. Diebolt,et al. A Stochastic EM algorithm for approximating the maximum likelihood estimate , 1995 .

[47] G. Roberts,et al. Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler , 1997 .

[48] Bernard Delyon,et al. Accelerated Stochastic Approximation , 1993, SIAM J. Optim..

[49] R. Jennrich,et al. Conjugate Gradient Acceleration of the EM Algorithm , 1993 .

[50] David M. Rocke,et al. Computable Robust Estimation of Multivariate Location and Shape in High Dimension Using Compound Estimators , 1994 .

[51] M. Woodbury. A missing information principle: theory and applications , 1972 .

[52] Shun-ichi Amari,et al. Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[53] J. Ware,et al. Random-effects models for longitudinal data. , 1982, Biometrics.

[54] J. Dupuis. Bayesian estimation of movement and survival probabilities from capture-recapture data , 1995 .

[55] Jeffrey A. Fessler,et al. Grouped-coordinate ascent algorithms for penalized-likelihood transmission image reconstruction , 1997, IEEE Transactions on Medical Imaging.

[56] S. Stigler. Citation Patterns in the Journals of Statistics and Probability , 1994 .

[57] Xiao-Li Meng,et al. On the global and componentwise rates of convergence of the EM algorithm , 1994 .

[58] C. A. Smith,et al. Estimating linkage heterogeneity , 1996, Annals of human genetics.

[59] J. Hinde,et al. Random effects in generalized linear models and the em algoritham , 1988 .

[60] S. Mitter,et al. Metropolis-type annealing algorithms for global optimization in R d , 1993 .

[61] Jean Claude Biscarat. Almost sure convergence of a class of stochastic algorithms , 1994 .

[62] Paul. Damien,et al. Sampling probability densities via uniform random variables and a Gibbs sampler , 1996 .

[63] G. C. Wei,et al. A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[64] Jeremy MG Taylor,et al. Robust Statistical Modeling Using the t Distribution , 1989 .

[65] J. Besag,et al. Spatial Statistics and Bayesian Computation , 1993 .

[66] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[67] David E. Tyler,et al. A curious likelihood identity for the multivariate t-distribution , 1994 .

[68] R. Sundberg. An iterative method for solution of the likelihood equations for incomplete data from exponential families , 1976 .

[69] C. Robert,et al. Bayesian estimation of hidden Markov chains: a stochastic implementation , 1993 .

[70] C. A. Smith. Counting methods in genetical statistics. , 1957, Annals of human genetics.

[71] W. Qian,et al. Estimation of parameters in hidden Markov models , 1991, Philosophical Transactions of the Royal Society of London. Series A: Physical and Engineering Sciences.

[72] D. Ruppert. Computing S Estimators for Regression and Multivariate Location/Dispersion , 1992 .

[73] Jeffrey A. Fessler,et al. Spatial resolution properties of penalized-likelihood image reconstruction: space-invariant tomographs , 1996, IEEE Trans. Image Process..

[74] D. Rubin. Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[75] Xiao-Li Meng,et al. On the rate of convergence of the ECM algorithm , 1994 .

[76] S. Silvey,et al. An algorithm for optimal designs on a design space , 1978 .

[77] K. Lange,et al. EM reconstruction algorithms for emission and transmission tomography. , 1984, Journal of computer assisted tomography.

[78] C. Heyde,et al. Quasi‐Likelihood and Generalizing the Em Algorithm , 1996 .

[79] J. Kent,et al. Convergence Behavior of the em algorithm for the multivariate t -distribution , 1995 .

[80] Adrian F. M. Smith,et al. Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[81] C. McCulloch. Maximum Likelihood Variance Components Estimation for Binary Data , 1994 .

[82] D. M. Titterington,et al. Beyond the binary Boltzmann machine , 1995, IEEE Trans. Neural Networks.

[83] R. Sundberg. Maximum likelihood theory and applications for distributions generated when observing a function of an exponential family variable , 1972 .