The Concave-Convex Procedure

The concave-convex procedure (CCCP) is a way to construct discrete-time iterative dynamical systems that are guaranteed to decrease global optimization and energy functions monotonically. This procedure can be applied to almost any optimization problem, and many existing algorithms can be interpreted in terms of it. In particular, we prove that all expectation-maximization algorithms and classes of Legendre minimization and variational bounding algorithms can be reexpressed in terms of CCCP. We show that many existing neural network and mean-field theory algorithms are also examples of CCCP. The generalized iterative scaling algorithm and Sinkhorn's algorithm can also be expressed as CCCP by changing variables. CCCP can be used both as a new way to understand, and prove the convergence of, existing optimization algorithms and as a procedure for generating new algorithms.

[1]  Richard Sinkhorn A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices , 1964 .

[2]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[3]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[4]  Azriel Rosenfeld,et al.  Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Gilbert Strang,et al.  Introduction to applied mathematics , 1988 .

[7]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[8]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[9]  Westervelt,et al.  Dynamics of iterated-map neural networks. , 1989, Physical review. A, General physics.

[10]  Richard Szeliski,et al.  An Analysis of the Elastic Net Approach to the Traveling Salesman Problem , 1989, Neural Computation.

[11]  William H. Press,et al.  Numerical recipes , 1990 .

[12]  Alan L. Yuille,et al.  Generalized Deformable Models, Statistical Physics, and Matching Problems , 1990, Neural Computation.

[13]  Eric Mjolsness,et al.  Algebraic transformations of objective functions , 1990, Neural Networks.

[14]  B. Beavis,et al.  Optimization and Stability Theory for Economic Analysis: DYNAMICS AND STABILITY , 1990 .

[15]  S. Barnett,et al.  Analog neural networks with local competition. II: Application to associative memory , 1993 .

[16]  S. P. Luttrell,et al.  An adaptive Bayesian network for low-level image processing , 1993 .

[17]  Waugh,et al.  Analog neural networks with local competition. I. Dynamics and stability. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[18]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[19]  Alan L. Yuille,et al.  The invisible hand algorithm: Solving the assignment problem with statistical physics , 1994, Neural Networks.

[20]  J. J. Kosowsky,et al.  Statistical Physics Algorithms That Converge , 1994, Neural Computation.

[21]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[22]  Ibrahim M. Elfadel,et al.  Convex Potentials and their Conjugates in Analog Mean-Field Optimization , 1995, Neural Computation.

[23]  Eric Mjolsness,et al.  A Novel Optimizing Network Architecture with Applications , 1996, Neural Computation.

[24]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[26]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  T. Minka Expectation-Maximization as lower bound maximization , 1998 .

[28]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[29]  C. Byrne Iterative algorithms for deblurring and deconvolution with constraints , 1998 .

[30]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[31]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[32]  Alan L. Yuille,et al.  Convergence Properties of the Softassign Quadratic Assignment Algorithm , 1999, Neural Computation.

[33]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[34]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[35]  Anand Rangarajan,et al.  Self-annealing and self-annihilation: unifying deterministic annealing and relaxation labeling , 2000, Pattern Recognit..

[36]  C. Byrne,et al.  Block-iterative interior point optimization methods for image reconstruction from limited data , 2000 .

[37]  Daniel D. Lee,et al.  Multiplicative Updates for Classification by Mixture Models , 2001, NIPS.

[38]  Alan L. Yuille,et al.  CCCP Algorithms to Minimize the Bethe and Kikuchi Free Energies: Convergent Alternatives to Belief Propagation , 2002, Neural Computation.

[39]  Shun-ichi Amari,et al.  Stochastic Reasoning, Free Energy, and Information Geometry , 2004, Neural Computation.