Structural Learning with Amortized Inference

Training a structured prediction model involves performing several loss-augmented inference steps. Over the lifetime of the training, many of these inference problems, although different, share the same solution. We propose AI-DCD, an Amortized Inference framework for Dual Coordinate Descent method, an approximate learning algorithm, that accelerates the training process by exploiting this redundancy of solutions, without compromising the performance of the model. We show the efficacy of our method by training a structured SVM using dual coordinate descent for an entity-relation extraction task. Our method learns the same model as an exact training algorithm would, but call the inference engine only in 10% - 24% of the inference problems encountered during training. We observe similar gains on a multi-label classification task and with a Structured Percep-tron model for the entity-relation task.

[1]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[2]  Kai-Wei Chang,et al.  Tractable Semi-supervised Learning of Complex Structured Prediction Models , 2013, ECML/PKDD.

[3]  L. Getoor,et al.  1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[4]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[5]  Gourab Kundu,et al.  On Amortizing Inference Cost for Structured Prediction , 2012, EMNLP.

[6]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[7]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[8]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[9]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[10]  Tommi S. Jaakkola,et al.  Approximate inference in graphical models using lp relaxations , 2010 .

[11]  Gourab Kundu,et al.  Margin-based Decomposed Amortized Inference , 2013, ACL.

[12]  Dan Roth,et al.  Multi-core Structural SVM Training , 2013, ECML/PKDD.

[13]  Mark W. Schmidt,et al.  Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[14]  Sebastian Riedel,et al.  Incremental Integer Linear Programming for Non-projective Dependency Parsing , 2006, EMNLP.

[15]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[16]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[17]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[18]  S. Sundararajan,et al.  A Sequential Dual Method for Structural SVMs , 2011, SDM.

[19]  Dan Roth,et al.  Efficient Decomposed Learning for Structured Prediction , 2012, ICML.

[20]  Tamir Hazan,et al.  A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction , 2010, NIPS.

[21]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[22]  D. Roth 1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[23]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[24]  Mirella Lapata,et al.  Constraint-Based Sentence Compression: An Integer Programming Approach , 2006, ACL.