Smooth and Strong: MAP Inference with Linear Convergence

Maximum a-posteriori (MAP) inference is an important task for many applications. Although the standard formulation gives rise to a hard combinatorial optimization problem, several effective approximations have been proposed and studied in recent years. We focus on linear programming (LP) relaxations, which have achieved state-of-the-art performance in many applications. However, optimization of the resulting program is in general challenging due to non-smoothness and complex non-separable constraints. Therefore, in this work we study the benefits of augmenting the objective function of the relaxation with strong convexity. Specifically, we introduce strong convexity by adding a quadratic term to the LP relaxation objective. We provide theoretical guarantees for the resulting programs, bounding the difference between their optimal value and the original optimum. Further, we propose suitable optimization algorithms and analyze their convergence.

[1]  Richard S. Zemel,et al.  High Order Regularization for Semi-Supervised Learning of Structured Output Problems , 2014, ICML.

[2]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  References , 1971 .

[4]  Andrew McCallum,et al.  Message Passing for Soft Constraint Dual Decomposition , 2014, UAI.

[5]  Ofer Meshi,et al.  An Alternating Direction Method for Dual MAP LP Relaxation , 2011, ECML/PKDD.

[6]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[7]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[8]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[11]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[12]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[13]  Christoph Schnörr,et al.  A bundle approach to efficient MAP-inference by Lagrangian relaxation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[15]  Elad Hazan,et al.  A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization , 2013, 1301.4666.

[16]  Eric P. Xing,et al.  An Augmented Lagrangian Approach to Constrained MAP Inference , 2011, ICML.

[17]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jason K. Johnson,et al.  Convex relaxation methods for graphical models: Lagrangian and maximum entropy approaches , 2008 .

[19]  Tommi S. Jaakkola,et al.  Introduction to dual composition for inference , 2011 .

[20]  Martin J. Wainwright,et al.  Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes , 2010, J. Mach. Learn. Res..

[21]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[22]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[23]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24]  Vladimir Kolmogorov,et al.  An Analysis of Convex Relaxations for MAP Estimation , 2007, NIPS.

[25]  Christoph Schnörr,et al.  A study of Nesterov's scheme for Lagrangian decomposition and MAP labeling , 2011, CVPR 2011.

[26]  Yair Weiss,et al.  Linear Programming Relaxations and Belief Propagation - An Empirical Study , 2006, J. Mach. Learn. Res..

[27]  D. Sontag 1 Introduction to Dual Decomposition for Inference , 2010 .

[28]  Daniel Prusa,et al.  Universality of the Local Marginal Polytope , 2013, CVPR.

[29]  Tomás Werner,et al.  Revisiting the Linear Programming Relaxation Approach to Gibbs Energy Minimization and Weighted Constraint Satisfaction , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Marc Pollefeys,et al.  Globally Convergent Parallel MAP LP Relaxation Solver using the Frank-Wolfe Algorithm , 2014, ICML.

[31]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[32]  Marc Pollefeys,et al.  Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins , 2012, NIPS.

[33]  Tamir Hazan,et al.  Efficient Training of Structured SVMs via Soft Constraints , 2015, AISTATS.

[34]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[35]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[36]  Tommi S. Jaakkola,et al.  Convergence Rate Analysis of MAP Coordinate Minimization Algorithms , 2012, NIPS.