Efficient Exact Inference With Loss Augmented Objective in Structured Learning

Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms—the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the <italic>slack</italic> and <italic>margin</italic> scaling formulations of structural SVM with a variety of <italic>dissimilarity measures</italic>, e.g., Hamming loss, precision and recall, <inline-formula> <tex-math notation="LaTeX">$F_{\beta }$ </tex-math></inline-formula>-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.

[1]  Ivor W. Tsang,et al.  Objective-Guided Image Annotation , 2013, IEEE Transactions on Image Processing.

[2]  Yang Wang,et al.  Optimizing Complex Loss Functions in Structured Prediction , 2010, ECCV.

[3]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[5]  Brian Roark,et al.  Beam-Width Prediction for Efficient Context-Free Parsing , 2011, ACL.

[6]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[7]  Binbin Zhai,et al.  Approximated Slack Scaling for Structural Support Vector Machines in Scene Depth Analysis , 2013 .

[8]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[9]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[10]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  Richard S. Zemel,et al.  Structured Output Learning with High Order Loss Functions , 2012, AISTATS.

[13]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..

[14]  S. Sundararajan,et al.  A Simple Label Switching Algorithm for Semisupervised Structural SVMs , 2015, Neural Computation.

[15]  Thomas Hofmann,et al.  Predicting structured objects with support vector machines , 2009, Commun. ACM.

[16]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[17]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[18]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[19]  Ming-Wei Chang,et al.  Discriminative Learning over Constrained Latent Representations , 2010, NAACL.

[20]  Thomas Hofmann,et al.  Large Scale Hidden Semi-Markov SVMs , 2007 .

[21]  Rahul Gupta,et al.  Accurate max-margin training for structured output spaces , 2008, ICML '08.

[22]  Alexander J. Smola,et al.  Bundle Methods for Machine Learning , 2007, NIPS.

[23]  Klaus-Robert Müller,et al.  Accurate Maximum-Margin Training for Parsing With Context-Free Grammars , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[25]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[26]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[27]  Christopher D. Manning,et al.  Learning Distributed Representations for Structured Output Prediction , 2014, NIPS.

[28]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[29]  Alexander M. Rush,et al.  A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[30]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[31]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[32]  Daniel Tarlow,et al.  Tighter Linear Program Relaxations for High Order Graphical Models , 2013, UAI.

[33]  Klaus-Robert Müller,et al.  Efficient Algorithms for Exact Inference in Sequence Labeling SVMs , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[35]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[36]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[37]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[38]  Tibério S. Caetano,et al.  Faster Algorithms for Max-Product Message-Passing , 2011, J. Mach. Learn. Res..

[39]  Brendan J. Frey,et al.  Fast Exact Inference for Recursive Cardinality Models , 2012, UAI.

[40]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[41]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[42]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[43]  S. Sundararajan,et al.  A Sequential Dual Method for Structural SVMs , 2011, SDM.

[44]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.