Learning to Speed Up Structured Output Prediction

Predicting structured outputs can be computationally onerous due to the combinatorially large output spaces. In this paper, we focus on reducing the prediction time of a trained black-box structured classifier without losing accuracy. To do so, we train a speedup classifier that learns to mimic a black-box classifier under the learning-to-search approach. As the structured classifier predicts more examples, the speedup classifier will operate as a learned heuristic to guide search to favorable regions of the output space. We present a mistake bound for the speedup classifier and identify inference situations where it can independently make correct judgments without input features. We evaluate our method on the task of entity and relation extraction and show that the speedup classifier outperforms even greedy search in terms of speed without loss of accuracy.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  Alan Fern,et al.  Speedup Learning , 2010, Encyclopedia of Machine Learning.

[3]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[4]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[5]  Noah A. Smith Linguistic Structure Prediction , 2011, Synthesis Lectures on Human Language Technologies.

[6]  DONALD MICHIE,et al.  “Memo” Functions and Machine Learning , 1968, Nature.

[7]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[8]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[9]  Gourab Kundu,et al.  Margin-based Decomposed Amortized Inference , 2013, ACL.

[10]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[11]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[12]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[13]  He He,et al.  Dynamic Feature Selection for Dependency Parsing , 2013, EMNLP.

[14]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[15]  He He,et al.  Learning to Search in Branch and Bound Algorithms , 2014, NIPS.

[16]  Vivek Srikumar An Algebra for Feature Extraction , 2017, ACL.

[17]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[18]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Brian Roark,et al.  Beam-Width Prediction for Efficient Context-Free Parsing , 2011, ACL.

[21]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[22]  Jason Eisner,et al.  Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing , 2017, TACL.

[23]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[24]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[25]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[26]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[27]  Claude Lemaréchal,et al.  Lagrangian Relaxation , 2000, Computational Combinatorial Optimization.

[28]  Gourab Kundu,et al.  On Amortizing Inference Cost for Structured Prediction , 2012, EMNLP.

[29]  Alan Fern,et al.  HC-Search: A Learning Framework for Search-based Structured Prediction , 2014, J. Artif. Intell. Res..

[30]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[31]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[32]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[33]  Noah D. Goodman,et al.  Learning Stochastic Inverses , 2013, NIPS.

[34]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.