The Noisy Expectation Maximization Algorithm

We present a noise-injected version of the expectation–maximization (EM) algorithm: the noisy expectation–maximization (NEM) algorithm. The NEM algorithm uses noise to speed up the convergence of the EM algorithm. The NEM theorem shows that additive noise speeds up the average convergence of the EM algorithm to a local maximum of the likelihood surface if a positivity condition holds. Corollary results give special cases when noise improves the EM algorithm. We demonstrate these noise benefits on EM algorithms for three data models: the Gaussian mixture model (GMM), the Cauchy mixture model (CMM), and the censored log-convex gamma model. The NEM positivity condition simplifies to a quadratic inequality in the GMM and CMM cases. A final theorem shows that the noise benefit for independent identically distributed additive noise decreases with sample size in mixture models. This theorem implies that the noise benefit is most pronounced if the data is sparse.

[1]  H. Kramers Brownian motion in a field of force and the diffusion model of chemical reactions , 1940 .

[2]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[3]  W. Rudin Principles of mathematical analysis , 1964 .

[4]  V. Hasselblad Estimation of parameters for a mixture of normal distributions , 1966 .

[5]  D. V. Lindley,et al.  An Introduction to Probability Theory and Its Applications. Volume II , 1967, The Mathematical Gazette.

[6]  Frank E. Grubbs,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[7]  J. Gurland,et al.  Goodness of Fit Tests for the Gamma and Exponential Distributions , 1972 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  L. Shepp,et al.  Maximum Likelihood Reconstruction for Emission Tomography , 1983, IEEE Transactions on Medical Imaging.

[10]  R. A. Boyles On the Convergence of the EM Algorithm , 1983 .

[11]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  Gerald B. Folland,et al.  Real Analysis: Modern Techniques and Their Applications , 1984 .

[14]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[15]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[16]  S. Geman,et al.  Diffusions for global optimizations , 1986 .

[17]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[18]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[20]  R. Durrett Probability: Theory and Examples , 1993 .

[21]  P. Bacchetti Estimating the Incubation Period of AIDS by Comparing Population Infection and Diagnosis Patterns , 1990 .

[22]  A. A. Reilly,et al.  An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences , 1990, Proteins.

[23]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[24]  Bart Kosko,et al.  Neural networks and fuzzy systems: a dynamical systems approach to machine intelligence , 1991 .

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[27]  David Levy,et al.  Book review: Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence by Bart Kosko (Prentice Hall 1992) , 1992, CARN.

[28]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[29]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[30]  Raymond Kapral,et al.  Proceedings of a NATO Advanced Research Workshop , 1993 .

[31]  H. Malcolm Hudson,et al.  Accelerated image reconstruction using ordered subsets of projection data , 1994, IEEE Trans. Medical Imaging.

[32]  Alfred O. Hero,et al.  Space-alternating generalized expectation-maximization algorithm , 1994, IEEE Trans. Signal Process..

[33]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[34]  D. Chauveau A stochastic EM algorithm for mixtures with censored data , 1995 .

[35]  Adi R. Bulsara,et al.  Tuning in to Noise , 1996 .

[36]  Bulsara,et al.  Threshold detection of wideband signals: A noise-induced maximum in the mutual information. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[37]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[38]  Frank Moss,et al.  Noise in human muscle spindles , 1996, Nature.

[39]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[40]  J. Javier Brey,et al.  STOCHASTIC RESONANCE IN A ONE-DIMENSIONAL ISING MODEL , 1996 .

[41]  F. W. Schneider,et al.  STOCHASTIC RESONANCE IN CHEMISTRY. 2. THE PEROXIDASE-OXIDASE REACTION , 1996 .

[42]  Jean Pierre Delmas,et al.  An equivalence of the EM and ICE algorithm for exponential family , 1997, IEEE Trans. Signal Process..

[43]  Gérard Govaert,et al.  Clustering of Spatial Data by the EM Algorithm , 1997 .

[44]  James C. Weaver,et al.  Detection of weak electric fields by sharks, rays, and skates. , 1998, Chaos.

[45]  B. Kosko,et al.  Adaptive stochastic resonance , 1998, Proc. IEEE.

[46]  M. A. Tanner,et al.  Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 3rd Edition , 1998 .

[47]  M. Reillya,et al.  A likelihood-based method of identifying contaminated lots of blood product , 1999 .

[48]  Gilles Celeux,et al.  A Component-Wise EM Algorithm for Mixtures , 2001, 1201.5913.

[49]  Cheng-Shang Chang Calculus , 2020, Bicycle or Unicycle?.

[50]  Frank Proschan Theoretical Explanation of Observed Decreasing Failure Rate , 2000, Technometrics.

[51]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[52]  Stephen M. Smith,et al.  Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm , 2001, IEEE Transactions on Medical Imaging.

[53]  Peter Hänggi,et al.  Stochastic resonance in biology. How noise can enhance detection of weak signals and help improve biological information processing. , 2002, Chemphyschem : a European journal of chemical physics and physical chemistry.

[54]  Bart Kosko,et al.  Stochastic resonance in noisy threshold neurons , 2003, Neural Networks.

[55]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[56]  M. Bagnoli,et al.  Log-concave probability and its applications , 2004 .

[57]  François Chapeau-Blondeau,et al.  Noise-enhanced performance for an optimal Bayesian estimator , 2004, IEEE Transactions on Signal Processing.

[58]  Jian Wang,et al.  Maximum Likelihood Estimation of Compound-Gaussian Clutter and Target Parameters , 2006, IEEE Transactions on Signal Processing.

[59]  W. R. Howard The Nature of Mathematical Modeling , 2006 .

[60]  Gregoire Nicolis,et al.  Stochastic resonance , 2007, Scholarpedia.

[61]  Miguel Á. Carreira-Perpiñán,et al.  Gaussian Mean-Shift Is an EM Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  C. Pearce,et al.  Stochastic Resonance: From Suprathreshold Stochastic Resonance to Stochastic Signal Quantization , 2008 .

[63]  G. Tian,et al.  Bayesian Missing Data Problems: EM, Data Augmentation and Noniterative Computation , 2009 .

[64]  Pramod K. Varshney,et al.  Noise Enhanced Nonparametric Detection , 2009, IEEE Transactions on Information Theory.

[65]  Maya R. Gupta,et al.  Theory and Use of the EM Algorithm , 2011, Found. Trends Signal Process..

[66]  Osonde Osoba,et al.  Noise benefits in the expectation-maximization algorithm: Nem theorems and models , 2011, The 2011 International Joint Conference on Neural Networks.

[67]  B. Kosko,et al.  Noise can speed convergence in Markov chains. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[68]  Ashok Patel,et al.  Noise Benefits in Quantizer-Array Correlation Detection and Watermark Decoding , 2011, IEEE Transactions on Signal Processing.

[69]  J. Norris Appendix: probability and measure , 1997 .

[70]  Osonde Osoba,et al.  Corrigendum to "Noise enhanced clustering and competitive learning algorithms" [Neural Networks 37 (2013) 132-140] , 2013, Neural Networks.

[71]  Osonde Osoba,et al.  Noise-enhanced clustering and competitive learning algorithms , 2013, Neural Networks.

[72]  Stuart GEMANf DIFFUSIONS FOR GLOBAL OPTIMIZATION , 2022 .