PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

It is well known that the likelihood sequence of the EM algorithm is non- decreasing and convergent (Dempster, Laird and Rubin (1977)), and that the limit points of the EM algorithm are stationary points of the likelihood (Wu (1982)), but the issue of the convergence of the EM sequence itself has not been completely settled. In this paper we close this gap and show that under general, simple, veriable conditions, any EM sequence is convergent. In pathological cases we show that the sequence is cycling in the limit among a nite number of stationary points with equal likelihood. The results apply equally to the optimization transfer class of algorithms (MM algorithm) of Lange, Hunter, and Yang (2000). Two dieren t EM algorithms constructed on the same dataset illustrate the convergence and the cyclic behavior. This paper contains new results concerning the convergence of the EM al- gorithm. The EM algorithm was brought into the limelight by Dempster, Laird and Rubin (1977) as a general iterative method of computing the maximum likelihood estimator by maximizing a simpler likelihood on an augmented data space. However, the problem of the convergence of the algorithm has not been satisfactory resolved. Wu (1983), the main theoretical contribution in this area, showed that the limit points of the EM algorithm are stationary points of the likelihood, and that when the likelihood is unimodal, any EM sequence is con- vergent. Boyles (1983) has a number of results along similar lines. These results still allow the possibility of a non-convergent EM sequence when the likelihood is not unimodal. More importantly, the EM algorithm is useful when the likelihood is hard to obtain directly; for these cases, the unimodality of the likelihood is very dicult to verify. Here we give simple, general, veriable conditions for con- vergence: our main result (Theorem 3) is that any EM sequence is convergent, if the maximizer at the M-step is unique. This condition is almost always satis- ed in practice (otherwise the particular EM data augmentation scheme would