1 Introduction to Dual Decomposition for Inference

Many inference problems with discrete variables result in a difficult com-binatorial optimization problem. In recent years, the technique of dual decomposition , also called Lagrangian relaxation, has proven to be a powerful means of solving these inference problems by decomposing them into simpler components that are repeatedly solved independently and combined into a global solution. In this chapter, we introduce the general technique of dual decomposition through its application to the problem of finding the most likely (MAP) assignment in Markov random fields. We discuss both subgradient and block coordinate descent approaches to solving the dual problem. The resulting message-passing algorithms are similar to max-product, but can be shown to solve a linear programming relaxation of the MAP problem. We show how many of the MAP algorithms are related to each other, and also quantify when the MAP solution can and cannot be decoded directly from the dual solution. 1.1 Introduction Many problems in engineering and the sciences require solutions to challenging combinatorial optimization problems. These include traditional problems such as scheduling, planning, fault diagnosis, or searching for molecular conformations. In addition, a wealth of combinatorial problems arise directly from probabilistic modeling (graphical models). Graphical models (see Koller and Friedman, 2009, for a textbook introduction) have been widely adopted in areas such as computational biology, machine vision, and natural language processing, and are increasingly being used as a framework for expressing combinatorial problems. Consider, for example, a protein side-chain placement problem where the goal is to find the minimum energy conformation of amino acid side-chains along a fixed carbon backbone. The orientations of the side-chains are represented by discretized angles called rotamers. The combinatorial difficulty arises here from the fact that rotamer choices for nearby amino acids are energetically coupled. For globular proteins, for example, such couplings may be present for most pairs of side-chain orientations. This problem is couched in probabilistic modeling terms by associating molecular conformations with the setting of discrete random variables corresponding to the rotamer angles. The interactions between such random variables come from the energetic couplings between nearby amino acids. Finding the minimum energy conformation is then equivalently solved by finding the most probable assignment of states to the variables. We will consider here combinatorial problems that are expressed in terms of structured probability models (graphical models). A graphical model is defined over a set of discrete variables x = {x j } j∈V. Local …

[1]  Philip Wolfe,et al.  Validation of subgradient optimization , 1974, Math. Program..

[2]  Donald Erlenkotter,et al.  A Dual-Based Procedure for Uncapacitated Facility Location , 1978, Oper. Res..

[3]  Leslie G. Valiant,et al.  NP is as easy as detecting unique solutions , 1985, STOC '85.

[4]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[5]  Monique Guignard-Spielberg,et al.  Lagrangean decomposition: A model yielding stronger lagrangean bounds , 1987, Math. Program..

[6]  M. Rosenwein,et al.  An application-oriented guide for designing Lagrangean dual ascent algorithms , 1989 .

[7]  Dimitri P. Bertsekas,et al.  Auction algorithms for network flow problems: A tutorial introduction , 1992, Comput. Optim. Appl..

[8]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[9]  M. Guignard Lagrangean relaxation , 2003 .

[10]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[11]  Marshall L. Fisher,et al.  The Lagrangian Relaxation Method for Solving Integer Programming Problems , 2004, Manag. Sci..

[12]  Martin J. Wainwright,et al.  On the Optimality of Tree-reweighted Max-product Message-passing , 2005, UAI.

[13]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[14]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[15]  Daniel Tarlow,et al.  Using Combinatorial Optimization within Max-Product Belief Propagation , 2006, NIPS.

[16]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.

[17]  Yair Weiss,et al.  Linear Programming Relaxations and Belief Propagation - An Empirical Study , 2006, J. Mach. Learn. Res..

[18]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[19]  Giorgio Satta,et al.  On the Complexity of Non-Projective Data-Driven Dependency Parsing , 2007, IWPT.

[20]  Yair Weiss,et al.  MAP Estimation, Linear Programming and Belief Propagation with Convex Free Energies , 2007, UAI.

[21]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yair Weiss,et al.  Minimizing and Learning Energy Functions for Side-Chain Prediction , 2007, RECOMB.

[23]  Tomás Werner,et al.  High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jason K. Johnson,et al.  Convex relaxation methods for graphical models: Lagrangian and maximum entropy approaches , 2008 .

[25]  Tommi S. Jaakkola,et al.  Tightening LP Relaxations for MAP using Message Passing , 2008, UAI.

[26]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[27]  Pushmeet Kohli,et al.  Minimizing sparse higher order energy functions of discrete variables , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Laurence A. Wolsey,et al.  Two “well-known” properties of subgradient optimization , 2009, Math. Program..

[29]  Asuman E. Ozdaglar,et al.  Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods , 2008, SIAM J. Optim..

[30]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[31]  Tommi S. Jaakkola,et al.  Tree Block Coordinate Descent for MAP in Graphical Models , 2009, AISTATS.

[32]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[33]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[34]  Julian Yarkony,et al.  Covering trees and lower-bounds on quadratic assignment , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Alexander M. Rush,et al.  Dual Decomposition for Parsing with Non-Projective Head Automata , 2010, EMNLP.

[36]  Tommi S. Jaakkola,et al.  Approximate inference in graphical models using lp relaxations , 2010 .

[37]  Arthur M. Geoffrion,et al.  Lagrangian Relaxation for Integer Programming , 2010, 50 Years of Integer Programming.

[38]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .