Training parsers by inverse reinforcement learning

One major idea in structured prediction is to assume that the predictor computes its output by finding the maximum of a score function. The training of such a predictor can then be cast as the problem of finding weights of the score function so that the output of the predictor on the inputs matches the corresponding structured labels on the training set. A similar problem is studied in inverse reinforcement learning (IRL) where one is given an environment and a set of trajectories and the problem is to find a reward function such that an agent acting optimally with respect to the reward function would follow trajectories that match those in the training set. In this paper we show how IRL algorithms can be applied to structured prediction, in particular to parser training. We present a number of recent incremental IRL algorithms in a unified framework and map them to parser training algorithms. This allows us to recover some existing parser training algorithms, as well as to obtain a new one. The resulting algorithms are compared in terms of their sensitivity to the choice of various parameters and generalization ability on the Penn Treebank WSJ corpus.

[1]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[2]  Donald Geman,et al.  Application of the Gibbs distribution to image segmentation , 1984, ICASSP.

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[5]  E. Yaz Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[6]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[7]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[8]  Christian Igel,et al.  Improving the Rprop Learning Algorithm , 2000 .

[9]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[12]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[13]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[14]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[15]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[16]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[17]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[18]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[19]  Ben Taskar,et al.  Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[20]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[21]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[22]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[23]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[24]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[25]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[26]  William W. Cohen,et al.  Single-pass online learning: performance, voting schemes and online feature selection , 2006, KDD '06.

[27]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[28]  I. Dan Melamed,et al.  Advances in Discriminative Parsing , 2006, ACL.

[29]  Daniel Marcu,et al.  Practical structured learning techniques for natural language processing , 2006 .

[30]  Thomas Hofmann,et al.  Predicting Structured Data (Neural Information Processing) , 2007 .

[31]  Ivan Titov,et al.  Constituent Parsing with Incremental Sigmoid Belief Networks , 2007, ACL.

[32]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[33]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[34]  Nathan D. Ratliff Subgradient Methods for Structured Prediction , 2007 .

[35]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[36]  Ludovic Denoyer,et al.  Sequence Labeling with Reinforcement Learning and Ranking Algorithms , 2007, ECML.

[37]  Dan Klein,et al.  Learning and Inference for Hierarchically Split PCFGs , 2007, AAAI.

[38]  Xavier Carreras,et al.  Exponentiated gradient algorithms for log-linear structured prediction , 2007, ICML '07.

[39]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[40]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[41]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..