Message passing algorithms for optimization

The max-product algorithm, which attempts to compute the most probable assignment (MAP) of a given probability distribution via a distributed, local message passing scheme, has recently found applications in convex minimization and combinatorial optimization. Unfortunately, the max-product algorithm is not guaranteed to converge and, even if it does, is not guaranteed to produce the MAP assignment. Many alternative message passing schemes have been proposed to overcome these difficulties (e.g. TEMP, MPLP, max-sum diffusion). These algorithms can be viewed as coordinate ascent schemes over different duals of a linear programming formulation of the MAP problem. If these algorithms converge to a unique assignment, then this assignment is guaranteed to be the maximum of the objective function. Although these algorithms provide stronger guarantees than max-product upon convergence, they do not always converge to a unique assignment, and in some instances, the dual optimization problem that results provides a trivial upper bound on the maximizing assignment. In this work, we provide a systematic study of message passing algorithms for the related problem of minimizing an arbitrary real-valued objective function: from graphical models to reparameterization, reparameterization to lower bounds, and from lower bounds to convergent message passing algorithms. We generalize the known results by providing conditions under which the assignments produced by message passing algorithms can correspond to local and global optima, by providing a combinatorial characterization of when these message passing schemes can actually solve the minimization problem, and by providing a new convergent and correct message passing algorithm, called the splitting algorithm, that contains many of the known convergent message passing algorithms as a special case. These ideas allow us to expand the usefulness of the splitting algorithm beyond the limits of other message passing algorithms. We show that there are examples of convex minimization problems on which convergent message passing algorithms fail to produce a minimizing assignment but that the splitting algorithm succeeds. We use graph covers and our conditions for local optimality to provide conditions under which the splitting algorithm can be used to solve general convex (as well as submodular) minimization problems. These observations lead us to a generalization of diagonal dominance for arbitrary convex functions.

[1]  Amir Globerson,et al.  Convergent message passing algorithms - a unifying view , 2009, UAI.

[2]  R.J. McEliece,et al.  Iterative decoding on graphs with a single cycle , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[3]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Martin Grötschel,et al.  Mathematical Programming The State of the Art, XIth International Symposium on Mathematical Programming, Bonn, Germany, August 23-27, 1982 , 1983, ISMP.

[5]  Sekhar Tatikonda,et al.  Loopy Belief Propogation and Gibbs Measures , 2002, UAI.

[6]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[7]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[8]  Dana Angluin,et al.  Finite common coverings of pairs of regular graphs , 1981, J. Comb. Theory B.

[9]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Dana Angluin,et al.  Local and global properties in networks of processors (Extended Abstract) , 1980, STOC '80.

[11]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[12]  Charles L. Byrne,et al.  Applied Iterative Methods , 2007 .

[13]  N. Ruozzi,et al.  Graph covers and quadratic minimization , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Martin J. Wainwright,et al.  MAP estimation via agreement on (hyper)trees: Message-passing and linear programming , 2005, ArXiv.

[15]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Martin J. Wainwright,et al.  Tree consistency and bounds on the performance of the max-product algorithm and its generalizations , 2004, Stat. Comput..

[17]  Dmitry M. Malioutov,et al.  Belief Propagation and LP Relaxation for Weighted Matching in General Graphs , 2011, IEEE Transactions on Information Theory.

[18]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[19]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[20]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[21]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[22]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[23]  Benjamin Van Roy,et al.  Convergence of Min-Sum Message Passing for Quadratic Optimization , 2006, IEEE Transactions on Information Theory.

[24]  Martin J. Wainwright,et al.  Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching , 2003, AISTATS.

[25]  A. Glavieux,et al.  Near Shannon limit error-correcting coding and decoding: Turbo-codes. 1 , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[26]  S. Fujishige,et al.  A Strongly Polynomial-Time Algorithm for Minimizing Submodular Functions (Algorithm Engineering as a New Paradigm) , 1999 .

[27]  Jason K. Johnson,et al.  Convex relaxation methods for graphical models: Lagrangian and maximum entropy approaches , 2008 .

[28]  Stark C. Draper,et al.  Divide and Concur and Difference-Map BP Decoders for LDPC Codes , 2011, IEEE Transactions on Information Theory.

[29]  C. Borgs,et al.  On the exactness of the cavity method for weighted b-matchings on arbitrary graphs and its relation to linear programs , 2008, 0807.3159.

[30]  Martin J. Wainwright,et al.  Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes , 2010, J. Mach. Learn. Res..

[31]  Robert J. McEliece,et al.  Iterative min-sum decoding of tail-biting codes , 1998, 1998 Information Theory Workshop (Cat. No.98EX131).

[32]  Dmitry M. Malioutov,et al.  Approximate inference in Gaussian graphical models , 2008 .

[33]  P. Vontobel,et al.  Graph-Cover Decoding and Finite-Length Analysis of Message-Passing Iterative Decoding of LDPC Codes , 2005, ArXiv.

[34]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Dmitry M. Malioutov,et al.  Walk-Sums and Belief Propagation in Gaussian Graphical Models , 2006, J. Mach. Learn. Res..

[36]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[37]  N. Ruozzi,et al.  s-t paths using the min-sum algorithm , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[38]  Danny Dolev,et al.  Fixing convergence of Gaussian belief propagation , 2009, 2009 IEEE International Symposium on Information Theory.

[39]  M. Bayati,et al.  Max-Product for Maximum Weight Matching: Convergence, Correctness, and LP Duality , 2008, IEEE Transactions on Information Theory.

[40]  Devavrat Shah,et al.  Message Passing for Maximum Weight Independent Set , 2008, IEEE Transactions on Information Theory.

[41]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[42]  Benjamin Van Roy,et al.  Convergence of Min-Sum Message-Passing for Convex Optimization , 2010, IEEE Transactions on Information Theory.

[43]  Riccardo Zecchina,et al.  A rigorous analysis of the cavity equations for the minimum spanning tree , 2009, ArXiv.

[44]  Tommi S. Jaakkola,et al.  Tree Block Coordinate Descent for MAP in Graphical Models , 2009, AISTATS.

[45]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[46]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[47]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[48]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[49]  Pascal O. Vontobel,et al.  A combinatorial characterization of the Bethe and the Kikuchi partition functions , 2011, 2011 Information Theory and Applications Workshop.

[50]  Sekhar Tatikonda,et al.  Unconstrained minimization of quadratic functions via min-sum , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[51]  Martin J. Wainwright,et al.  Using linear programming to Decode Binary linear codes , 2005, IEEE Transactions on Information Theory.

[52]  Alexander Schrijver,et al.  A Combinatorial Algorithm Minimizing Submodular Functions in Strongly Polynomial Time , 2000, J. Comb. Theory B.

[53]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[54]  Martin J. Wainwright,et al.  On the Optimality of Tree-reweighted Max-product Message-passing , 2005, UAI.