Structured prediction with reinforcement learning

We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SP-MDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP-MDP is equivalent to minimizing the empirical loss. This link between the supervised learning formulation of structured prediction and reinforcement learning (RL) allows us to use approximate RL methods for learning the policy. The proposed model makes weak assumptions both on the nature of the Structured Prediction problem and on the supervision process. It does not make any assumption on the decomposition of loss functions, on data encoding, or on the availability of optimal policies for training. It then allows us to cope with a large range of structured prediction problems. Besides, it scales well and can be used for solving both complex and large-scale real-world problems. We describe two series of experiments. The first one provides an analysis of RL on classical sequence prediction benchmarks and compares our approach with state-of-the-art SP algorithms. The second one introduces a tree transformation problem where most previous models fail. This is a complex instance of the general labeled tree mapping problem. We show that RL exploration is effective and leads to successful results on this challenging task. This is a clear confirmation that RL could be used for large size and complex structured prediction problems.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  Walter L. Ruzzo,et al.  On the Complexity of General Context-Free Language Parsing and Recognition (Extended Abstract) , 1979, ICALP.

[3]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[4]  Robert H. Kassel,et al.  A comparison of approaches to on-line handwritten character recognition , 1995 .

[5]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[6]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[7]  Frédérick Garcia,et al.  A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon , 1998, ICML.

[8]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[9]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[12]  Dan Roth,et al.  Learning with Feature Description Logics , 2002, ILP.

[13]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[14]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[15]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[16]  Pedro M. Domingos,et al.  Learning to Match the Schemas of Data Sources: A Multistrategy Approach , 2003, Machine Learning.

[17]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[18]  Minh Le Nguyen,et al.  FlexCRFs: Flexible Conditional Random Fields , 2005 .

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[21]  Boris Chidlovskii,et al.  A Probabilistic Learning Method for XML Annotation of Documents , 2005, IJCAI.

[22]  Ludovic Denoyer,et al.  The Wikipedia XML Corpus , 2006, INEX.

[23]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[24]  Ludovic Denoyer,et al.  Report on the XML Mining Track at INEX 2005 and INEX 2006 , 2006, INEX.

[25]  DenoyerLudovic,et al.  The Wikipedia XML corpus , 2006 .

[26]  Isabelle Tellier,et al.  Conditional Random Fields for XML Trees , 2006 .

[27]  Ludovic Denoyer,et al.  Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents , 2007, SIGF.

[28]  Ivan Titov,et al.  Incremental Bayesian networks for structure prediction , 2007, ICML '07.

[29]  Ludovic Denoyer,et al.  Probabilistic Model for Structured Document Mapping , 2007, MLDM.

[30]  Ludovic Denoyer,et al.  Sequence Labeling with Reinforcement Learning and Ranking Algorithms , 2007, ECML.

[31]  Xavier Carreras,et al.  Exponentiated gradient algorithms for log-linear structured prediction , 2007, ICML '07.

[32]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.