A Credit Assignment Compiler for Joint Prediction

Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time.

[1]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[2]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[3]  D. Roth 1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[4]  Alan Fern,et al.  On learning linear ranking functions for beam search , 2007, ICML '07.

[5]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[6]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[7]  L. Getoor,et al.  1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[8]  Alan Fern,et al.  Output Space Search for Structured Prediction , 2012, ICML.

[9]  Thomas A. Henzinger,et al.  Probabilistic programming , 2014, FOSE.

[10]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[11]  John Langford,et al.  Learning to Search for Dependencies , 2015, ArXiv.

[12]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[13]  Dan Roth,et al.  Multi-core Structural SVM Training , 2013, ECML/PKDD.

[14]  Joakim Nivre,et al.  Training Deterministic Parsers with Non-Deterministic Oracles , 2013, TACL.

[15]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[16]  Jorge Nocedal,et al.  A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large Scale Optimization , 1991, SIAM J. Optim..

[17]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[18]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[19]  Parisa Kordjamshidi,et al.  Saul: Towards Declarative Learning Based Programming , 2015, IJCAI.

[20]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[21]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[22]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[23]  John Langford,et al.  Online Importance Weight Aware Updates , 2010, UAI.

[24]  Alan Fern,et al.  Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.

[25]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[26]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[27]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[28]  John Langford,et al.  Normalized Online Learning , 2013, UAI.

[29]  Robert E. Schapire,et al.  A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[30]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[31]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[32]  Noah A. Smith,et al.  Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language , 2005, HLT.

[33]  Alan Fern,et al.  HC-Search: A Learning Framework for Search-based Structured Prediction , 2014, J. Artif. Intell. Res..

[34]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[35]  Ming-Wei Chang,et al.  IllinoisSL: A JAVA Library for Structured Prediction , 2015, ArXiv.

[36]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[37]  John Langford Vowpal Wabbit , 2014 .

[38]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[39]  Lise Getoor,et al.  A short introduction to probabilistic soft logic , 2012, NIPS 2012.

[40]  Thomas P. Hayes,et al.  Error limiting reductions between classification tasks , 2005, ICML.

[41]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[42]  Avi Pfeffer,et al.  IBAL: A Probabilistic Rational Programming Language , 2001, IJCAI.

[43]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[44]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[45]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[46]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[47]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[48]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[49]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[50]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[51]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.