Optimization Transfer Using Surrogate Objective Functions

Abstract The well-known EM algorithm is an optimization transfer algorithm that depends on the notion of incomplete or missing data. By invoking convexity arguments, one can construct a variety of other optimization transfer algorithms that do not involve missing data. These algorithms all rely on a majorizing or minorizing function that serves as a surrogate for the objective function. Optimizing the surrogate function drives the objective function in the correct direction. This article illustrates this general principle by a number of specific examples drawn from the statistical literature. Because optimization transfer algorithms often exhibit the slow convergence of EM algorithms, two methods of accelerating optimization transfer are discussed and evaluated in the context of specific problems.

[1]  Frank Yates,et al.  The Analysis of Multiple Classifications with Unequal Numbers in the Different Classes , 1934 .

[2]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS , 1952 .

[3]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[4]  Alston S. Householder,et al.  The Theory of Matrices in Numerical Analysis , 1964 .

[5]  THE THEORY OF MATRICES IN NUMERICAL ANALYSIS , 1965 .

[6]  Werner Dinkelbach On Nonlinear Fractional Programming , 1967 .

[7]  J. D. Pearson ON VARIABLE METRIC METHODS OF MINIMIZATION , 1968 .

[8]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[9]  E. Schlossmacher An Iterative Technique for Absolute Deviations Curve Fitting , 1973 .

[10]  Jan Kmenta,et al.  A General Procedure for Obtaining Maximum Likelihood Estimates in Generalized Regression Models , 1974 .

[11]  John W. Tukey,et al.  Data Analysis and Regression: A Second Course in Statistics , 1977 .

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Ulrich Eckhardt,et al.  Weber's problem and weiszfeld's algorithm in general spaces , 1980, Math. Program..

[14]  I. Olkin,et al.  Inequalities: Theory of Majorization and Its Applications , 1980 .

[15]  Stephen M. Stigler,et al.  STIGLER'S LAW OF EPONYMY† , 1980 .

[16]  K. Lange,et al.  EM reconstruction algorithms for emission and transmission tomography. , 1984, Journal of computer assisted tomography.

[17]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[18]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[19]  J. Meulman A Distance Approach to Nonlinear Multivariate Analysis , 1986 .

[20]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[21]  Willem S. Heiser Correspondence analysis with least absolute residuals , 1987 .

[22]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[23]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[24]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[25]  A. Peressini,et al.  The Mathematics Of Nonlinear Programming , 1988 .

[26]  J. Leeuw Convergence of the majorization method for multidimensional scaling , 1988 .

[27]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[28]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[29]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[30]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[31]  William H. Press,et al.  Numerical Recipes: FORTRAN , 1988 .

[32]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[33]  J. Leeuw,et al.  Fitting longitudinal reduced-rank regression models by alternating least squares , 1991 .

[34]  S. Lauritzen,et al.  Globally convergent algorithms for maximizing a likelihood function , 1991 .

[35]  Timothy F. Havel An evaluation of computational strategies for use in the determination of protein structure from distance constraints obtained by nuclear magnetic resonance. , 1991, Progress in biophysics and molecular biology.

[36]  YOKES AND TENSORS DERIVED FROM YOKES , 1991 .

[37]  Nicholas I. M. Gould,et al.  Convergence of quasi-Newton matrices generated by the symmetric rank one update , 1991, Math. Program..

[38]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[39]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[40]  J. Meulman The integration of multidimensional scaling and multivariate analysis with optimal transformations , 1992 .

[41]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[42]  J. Berge,et al.  Minimization of a class of matrix trace functions by means of refined majorization , 1992 .

[43]  Willem J. Heiser,et al.  Resistant orthogonal procrustes analysis , 1992 .

[44]  Radford M. Neal An improved acceptance procedure for the hybrid Monte Carlo algorithm , 1992, hep-lat/9208011.

[45]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[46]  Anthony C. Atkinson,et al.  Optimum Experimental Designs , 1992 .

[47]  Patrick J. F. Groenen,et al.  The majorization approach to multidimensional scaling : some problems and extensions , 1993 .

[48]  R. Jennrich,et al.  Conjugate Gradient Acceleration of the EM Algorithm , 1993 .

[49]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[50]  K. Lange,et al.  Normal/Independent Distributions and Their Applications in Robust Regression , 1993 .

[51]  James P. Keener,et al.  The Perron-Frobenius Theorem and the Ranking of Football Teams , 1993, SIAM Rev..

[52]  Richard H. Byrd,et al.  A Theoretical and Experimental Study of the Symmetric Rank-One Update , 1993, SIAM J. Optim..

[53]  Ivo A. van der Lans,et al.  Robust canonical discriminant analysis , 1994 .

[54]  W. Heiser,et al.  Resistant lower rank approximation of matrices by iterative majorization , 1994 .

[55]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[56]  Per A. Mykland,et al.  Bartlett Type Identities for Martingales , 1994 .

[57]  Jan de Leeuw,et al.  Block-relaxation Algorithms in Statistics , 1994 .

[58]  K. Lange An adaptive barrier method for convex programming , 1994 .

[59]  Peter Verboon,et al.  A robust approach to nonlinear multivariate analysis , 1994 .

[60]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[61]  Jeffrey A. Fessler,et al.  Ieee Transactions on Image Processing: to Appear Globally Convergent Algorithms for Maximum a Posteriori Transmission Tomography , 2022 .

[62]  Henk A. L. Kiers Maximization of sums of quotients of quadratic forms and some generalizations , 1995 .

[63]  Alvaro R. De Pierro,et al.  A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography , 1995, IEEE Trans. Medical Imaging.

[64]  Rudolf Mathar,et al.  Least Squares Multidimensional Scaling with Transformed Distances , 1996 .

[65]  D. A. Wolf Recent advances in descriptive multivariate analysis , 1996 .

[66]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[67]  Xiao-Li Meng,et al.  Fitting Full-Information Item Factor Models and an Empirical Investigation of Bridge Sampling , 1996 .

[68]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[69]  Moh’d A. Al-Nimr,et al.  A THEORETICAL AND EXPERIMENTAL STUDY , 1996 .

[70]  P. T. Thach,et al.  Optimization on Low Rank Nonconvex Structures , 1996 .

[71]  Radford M. Neal Monte Carlo Implementation , 1996 .

[72]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[73]  Peter E. Jupp,et al.  Yokes and symplectic structures , 1997 .

[74]  D. Rubin,et al.  Parameter expansion to accelerate EM : The PX-EM algorithm , 1997 .

[75]  K. Lange,et al.  EM algorithms without missing data , 1997, Statistical methods in medical research.

[76]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[77]  R. Jennrich,et al.  Acceleration of the EM Algorithm by using Quasi‐Newton Methods , 1997 .

[78]  D. Hunter,et al.  An Optimization Transfer Algorithm for Quantile Regression , 1998 .

[79]  J. Booth,et al.  Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm , 1999 .

[80]  Xiao-Li Meng,et al.  Seeking efficient data augmentation schemes via conditional and marginal augmentation , 1999 .

[81]  Hakan Erdogan,et al.  Monotonic algorithms for transmission tomography , 1999, IEEE Transactions on Medical Imaging.

[82]  P. Groenen,et al.  Global Optimization in Least-Squares Multidimensional Scaling by Distance Smoothing , 1999 .

[83]  D. Oakes Direct calculation of the information matrix via the EM , 1999 .

[84]  J. Leeuw Applications of Convex Analysis to Multidimensional Scaling , 2000 .

[85]  Jan de Leeuw,et al.  MULTIVARIATE ANALYSIS WITH OPTIMAL SCALING , 2000 .

[86]  D. Hunter,et al.  Quantile Regression via an MM Algorithm , 2000 .

[87]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[88]  D. Hunter,et al.  Computing Estimates in the Proportional Odds Model , 2002 .