Efficient programmable learning to search

We improve "learning to search" approaches to structured prediction in two ways. First, we show that the search space can be defined by an arbitrary imperative program, reducing the number of lines of code required to develop new structured prediction tasks by orders of magnitude. Second, we make structured prediction orders of magnitude faster through various algorithmic improvements. We demonstrate the feasibility of our approach on three structured prediction tasks: two variants of sequence labeling and entity-relation resolution. In all cases we obtain accuracies at least as high as alternative approaches, at drastically reduced execution and programming time.

[1]  E. F. Tjong Kim Sang,et al.  Proceedings of CoNLL-2009 , 2009, ACL 2009.

[2]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[3]  Alan Fern,et al.  HC-Search: A Learning Framework for Search-based Structured Prediction , 2014, J. Artif. Intell. Res..

[4]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[5]  Lise Getoor,et al.  A short introduction to probabilistic soft logic , 2012, NIPS 2012.

[6]  John Langford,et al.  Online Importance Weight Aware Updates , 2010, UAI.

[7]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[8]  Alan Fern,et al.  Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.

[9]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[10]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[11]  Alan Fern,et al.  On learning linear ranking functions for beam search , 2007, ICML '07.

[12]  Thomas P. Hayes,et al.  Error limiting reductions between classification tasks , 2005, ICML.

[13]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[14]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[15]  Avi Pfeffer,et al.  IBAL: A Probabilistic Rational Programming Language , 2001, IJCAI.

[16]  Jorge Nocedal,et al.  A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large Scale Optimization , 1991, SIAM J. Optim..

[17]  John Langford,et al.  Normalized Online Learning , 2013, UAI.

[18]  Noah A. Smith,et al.  Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language , 2005, HLT.

[19]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[20]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[21]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[22]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[23]  Robert E. Schapire,et al.  A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[24]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[25]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[26]  L. Getoor,et al.  1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[27]  Alan Fern,et al.  Output Space Search for Structured Prediction , 2012, ICML.

[28]  Thomas A. Henzinger,et al.  Probabilistic programming , 2014, FOSE.

[29]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[30]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[31]  John Langford Vowpal Wabbit , 2014 .

[32]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[33]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[34]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[35]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[36]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[37]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[38]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[39]  John Langford,et al.  Learning to Search for Dependencies , 2015, ArXiv.

[40]  Joakim Nivre,et al.  Training Deterministic Parsers with Non-Deterministic Oracles , 2013, TACL.

[41]  Dan Roth,et al.  Multi-core Structural SVM Training , 2013, ECML/PKDD.

[42]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[43]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[44]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..