Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing

Pruning hypotheses during dynamic programming is commonly used to speed up inference in settings such as parsing. Unlike prior work, we train a pruning policy under an objective that measures end-to-end performance: we search for a fast and accurate policy. This poses a difficult machine learning problem, which we tackle with the lols algorithm. lols training must continually compute the effects of changing pruning decisions: we show how to make this efficient in the constituency parsing setting, via dynamic programming and change propagation algorithms. We find that optimizing end-to-end performance in this way leads to a better Pareto frontier—i.e., parsers which are more accurate for a given runtime.

[1]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[2]  Jiarong Jiang Efficient Non-deterministic Search in Structured Prediction: A Case Study on Syntactic Parsing , 2014 .

[3]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[4]  J. Andrew Bagnell,et al.  Stability Conditions for Online Learnability , 2011, ArXiv.

[5]  Dan Roth,et al.  The Use of Classifiers in Sequential Inference , 2001, NIPS.

[6]  J. B. Program transformations for optimization of parsing algorithms and other weighted logic programs , 2007 .

[7]  Yue Zhang,et al.  Learning to Prune: Context-Sensitive Pruning for Syntactic MT , 2013, ACL.

[8]  Liang Huang,et al.  Advanced Dynamic Programming in Semiring and Hypergraph Frameworks , 2008, COLING.

[9]  Andrew McCallum,et al.  Learning Dynamic Feature Selection for Fast Sequential Prediction , 2015, ACL.

[10]  André F. T. Martins,et al.  Parsing as Reduction , 2015, ACL.

[11]  Dan Klein,et al.  Less Grammar, More Features , 2014, ACL.

[12]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[13]  Dan Klein,et al.  Coarse-to-Fine Syntactic Machine Translation using Language Projections , 2008, EMNLP.

[14]  Giorgio Satta,et al.  Dynamic Programming Algorithms for Transition-Based Dependency Parsers , 2011, ACL.

[15]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[16]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[17]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[18]  Jason Eisner,et al.  Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.

[19]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[20]  Michel Galley,et al.  Direct Error Rate Minimization for Statistical Machine Translation , 2012, WMT@NAACL-HLT.

[21]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[22]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[23]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[24]  Nathan Bodenstab,et al.  Prioritization and pruning : efficient inference with weighted context-free grammars : a dissertaion , 2012 .

[25]  Yue Zhang,et al.  Fast and Accurate Shift-Reduce Constituent Parsing , 2013, ACL.

[26]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[27]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[28]  Eugene Charniak,et al.  Figures of Merit for Best-First Probabilistic Chart Parsing , 1998, Comput. Linguistics.

[29]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[30]  Dan Klein,et al.  Hierarchical Search for Parsing , 2009, HLT-NAACL.

[31]  Aaron Dunlop,et al.  Efficient Latent-variable Grammars : Learning and Inference , 2014 .

[32]  Alexander M. Rush,et al.  Vine Pruning for Efficient Multi-Pass Dependency Parsing , 2012, NAACL.

[33]  Jason Eisner,et al.  A Flexible Solver for Finite Arithmetic Circuits , 2012, ICLP.

[34]  John Langford,et al.  Learning to Search for Dependencies , 2015, ArXiv.

[35]  Adam R. Teichert,et al.  Learned Prioritization for Trading Off Accuracy and Speed , 2012, NIPS.

[36]  Umut A. Acar,et al.  Self-adjusting Computation with Delta ML , 2008, Advanced Functional Programming.

[37]  Benoît Crabbé,et al.  Multilingual discriminative lexicalized phrase structure parsing , 2015, EMNLP.

[38]  Daniel Marcu,et al.  Practical structured learning techniques for natural language processing , 2006 .

[39]  Brian Roark,et al.  Classifying Chart Cells for Quadratic Complexity Context-Free Inference , 2008, COLING.

[40]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[41]  Jonathan Berant,et al.  Imitation Learning of Agenda-based Semantic Parsers , 2015, TACL.

[42]  Veselin Stoyanov,et al.  Fast and Accurate Prediction via Evidence-Specific MRF Structure , 2012 .

[43]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[44]  He He,et al.  Dynamic Feature Selection for Dependency Parsing , 2013, EMNLP.

[45]  David A. Smith,et al.  Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.

[46]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[47]  Zhifei Li,et al.  First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests , 2009, EMNLP.

[48]  Brian Roark,et al.  Beam-Width Prediction for Efficient Context-Free Parsing , 2011, ACL.