Learning Convex QP Relaxations for Structured Prediction

We introduce a new large margin approach to discriminative training of intractable discrete graphical models. Our approach builds on a convex quadratic programming relaxation of the MAP inference problem. The model parameters are trained directly within this restricted class of energy functions so as to optimize the predictions on the training data. We address the issue of how to parameterize the resulting model and point out its relation to existing approaches. The primary motivation behind our use of the QP relaxation is its computational efficiency; yet, empirically, its predictive accuracy compares favorably to more expensive approaches. This makes it an appealing choice for many practical tasks.

[1]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[2]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[4]  Sebastian Nowozin,et al.  Regression Tree Fields — An efficient, non-parametric approach to image labeling problems , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[6]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[7]  Martin J. Wainwright,et al.  Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting , 2006, J. Mach. Learn. Res..

[8]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[9]  Eric P. Xing,et al.  Polyhedral outer approximations with application to natural language parsing , 2009, ICML '09.

[10]  Justin Domke,et al.  Parameter learning with truncated message-passing , 2011, CVPR 2011.

[11]  Sebastian Nowozin,et al.  Decision tree fields , 2011, 2011 International Conference on Computer Vision.

[12]  Marshall F. Tappen,et al.  The Logistic Random Field — A convenient graphical model for learning parameters for MRF-based labeling , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ben Taskar,et al.  Structured Prediction, Dual Extragradient and Bregman Projections , 2006, J. Mach. Learn. Res..

[14]  Vladimir Kolmogorov,et al.  Analyzing Convex Relaxations for MAP Estimation , 2009 .

[15]  Tamir Hazan,et al.  Norm-Product Belief Propagation: Primal-Dual Message-Passing for Approximate Inference , 2009, IEEE Transactions on Information Theory.

[16]  Bogdan Savchynskyy,et al.  Discriminative Learning of Max-Sum Classifiers , 2008, J. Mach. Learn. Res..

[17]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[18]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[19]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[20]  Jian Sun,et al.  Fast matting using large kernel matting Laplacian matrices , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[22]  Dani Lischinski,et al.  Spectral Matting , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Martin J. Wainwright,et al.  Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes , 2010, J. Mach. Learn. Res..

[24]  Nikos Komodakis,et al.  Efficient training for pairwise or higher order CRFs via dual decomposition , 2011, CVPR 2011.

[25]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[27]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[28]  Andrew McCallum,et al.  Piecewise training for structured prediction , 2009, Machine Learning.

[29]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[30]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[31]  Tamir Hazan,et al.  A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[32]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[33]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[34]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[35]  Pradeep Ravikumar,et al.  Quadratic programming relaxations for metric labeling and Markov random field MAP estimation , 2006, ICML.

[36]  Christoph Schnörr,et al.  Efficient MRF Energy Minimization via Adaptive Diminishing Smoothing , 2012, UAI.