Efficient Exact Inference With Loss Augmented Objective in Structured Learning.

Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.

[1]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[2]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[3]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[4]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[5]  Richard S. Zemel,et al.  Structured Output Learning with High Order Loss Functions , 2012, AISTATS.

[6]  Brendan J. Frey,et al.  Fast Exact Inference for Recursive Cardinality Models , 2012, UAI.

[7]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..

[8]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[9]  Alexander J. Smola,et al.  Bundle Methods for Machine Learning , 2007, NIPS.

[10]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[11]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[12]  S. Sundararajan,et al.  A Sequential Dual Method for Structural SVMs , 2011, SDM.

[13]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[14]  Klaus-Robert Müller,et al.  Accurate Maximum-Margin Training for Parsing With Context-Free Grammars , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[16]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[17]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[18]  Tibério S. Caetano,et al.  Faster Algorithms for Max-Product Message-Passing , 2011, J. Mach. Learn. Res..

[19]  Rahul Gupta,et al.  Accurate max-margin training for structured output spaces , 2008, ICML '08.

[20]  Alexander M. Rush,et al.  A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[21]  Ivor W. Tsang,et al.  Objective-Guided Image Annotation , 2013, IEEE Transactions on Image Processing.

[22]  Brian Roark,et al.  Beam-Width Prediction for Efficient Context-Free Parsing , 2011, ACL.

[23]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[24]  Daniel Tarlow,et al.  Tighter Linear Program Relaxations for High Order Graphical Models , 2013, UAI.

[25]  Klaus-Robert Müller,et al.  Efficient Algorithms for Exact Inference in Sequence Labeling SVMs , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[27]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[28]  Christopher D. Manning,et al.  Learning Distributed Representations for Structured Output Prediction , 2014, NIPS.

[29]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[30]  Binbin Zhai,et al.  Approximated Slack Scaling for Structural Support Vector Machines in Scene Depth Analysis , 2013 .

[31]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[32]  S. Sundararajan,et al.  A Simple Label Switching Algorithm for Semisupervised Structural SVMs , 2015, Neural Computation.

[33]  Thomas Hofmann,et al.  Predicting structured objects with support vector machines , 2009, Commun. ACM.

[34]  Yang Wang,et al.  Optimizing Complex Loss Functions in Structured Prediction , 2010, ECCV.

[35]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[36]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[37]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[38]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[39]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[40]  Ming-Wei Chang,et al.  Discriminative Learning over Constrained Latent Representations , 2010, NAACL.

[41]  Thomas Hofmann,et al.  Large Scale Hidden Semi-Markov SVMs , 2007 .

[42]  M. S. Ryan,et al.  The Viterbi Algorithm 1 1 The Viterbi Algorithm . , 2009 .