Structured Prediction, Dual Extragradient and Bregman Projections

We present a simple and scalable algorithm for maximum-margin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem that allows us to use simple projection methods based on the dual extragradient algorithm (Nesterov, 2003). The projection step can be solved using dynamic programming or combinatorial algorithms for min-cost convex flow, depending on the structure of the problem. We show that this approach provides a memory-efficient alternative to formulations based on reductions to a quadratic program (QP). We analyze the convergence of the method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.

[1]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[2]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[3]  Lamberto Cesari,et al.  Optimization-Theory And Applications , 1983 .

[4]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[5]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[6]  Paul Tseng,et al.  An ε-Relaxation Method for Separable Convex Cost Network Flow Problems , 1997, SIAM J. Optim..

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Dimitri P. Bertsekas,et al.  Network optimization : continuous and discrete models , 1998 .

[9]  R. Durbin,et al.  Biological sequence analysis: Background on probability , 1998 .

[10]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[11]  Joseph Naor,et al.  Approximation algorithms for the metric labeling problem via a new linear programming formulation , 2001, SODA '01.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[14]  L. Liao,et al.  Improvements of Some Projection Methods for Monotone Nonlinear Variational Inequalities , 2002 .

[15]  P. Tseng,et al.  Implementation and Test of Auction Methods for Solving Generalized Network Flow Problems with Separable Convex Cost , 2002 .

[16]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[17]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[18]  Ted Pedersen,et al.  An Evaluation Exercise for Word Alignment , 2003, ParallelTexts@NAACL-HLT.

[19]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[20]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[21]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[22]  Pierre Baldi,et al.  Large-Scale Prediction of Disulphide Bond Connectivity , 2004, NIPS.

[23]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[24]  Hermann Ney,et al.  Symmetric Word Alignments for Statistical Machine Translation , 2004, COLING.

[25]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[26]  R. Zabih,et al.  What energy functions can be minimized via graph cuts , 2004 .

[27]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[28]  Joseph Naor,et al.  A Linear Programming Formulation and Approximation Algorithms for the Metric Labeling Problem , 2005, SIAM J. Discret. Math..

[29]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[30]  Saharon Rosset,et al.  Tracking Curved Regularized Optimization Solution Paths , 2004, NIPS 2004.

[31]  Ben Taskar,et al.  Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[32]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[33]  Martin J. Wainwright,et al.  On the Optimality of Tree-reweighted Max-product Message-passing , 2005, UAI.

[34]  Ben Taskar,et al.  Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[36]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[37]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[38]  Martin J. Wainwright,et al.  MAP estimation via agreement on (hyper)trees: Message-passing and linear programming , 2005, ArXiv.

[39]  Ben Taskar,et al.  Structured Prediction via the Extragradient Method , 2005, NIPS.

[40]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[41]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.

[42]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..