Deep Reinforcement Learning for NLP

Many Natural Language Processing (NLP) tasks (including generation, language grounding, reasoning, information extraction, coreference resolution, and dialog) can be formulated as deep reinforcement learning (DRL) problems. However, since language is often discrete and the space for all sentences is infinite, there are many challenges for formulating reinforcement learning problems of NLP tasks. In this tutorial, we provide a gentle introduction to the foundation of deep reinforcement learning, as well as some practical DRL solutions in NLP. We describe recent advances in designing deep reinforcement learning for NLP, with a special focus on generation, dialogue, and information extraction. Finally, we discuss why they succeed, and when they may fail, aiming at providing some practical advice about deep reinforcement learning for solving real-world NLP problems.

[1]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[2]  William Yang Wang,et al.  Robust Distant Supervision Relation Extraction via Deep Reinforcement Learning , 2018, ACL.

[3]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[4]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[5]  William Yang Wang,et al.  MojiTalk: Generating Emotional Responses at Scale , 2017, ACL.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Xin Wang,et al.  Video Captioning via Hierarchical Reinforcement Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9]  Ramakanth Pasunuru,et al.  Reinforced Video Captioning with Entailment Rewards , 2017, EMNLP.

[10]  Yu Zhang,et al.  Deep Reinforcement Learning for Chinese Zero Pronoun Resolution , 2018, ACL.

[11]  Bowen Zhou,et al.  Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents , 2018, IJCAI.

[12]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Xin Wang,et al.  No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling , 2018, ACL.

[15]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[16]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[17]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18]  Wenhan Xiong,et al.  DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning , 2017, EMNLP.

[19]  Regina Barzilay,et al.  Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning , 2016, EMNLP.

[20]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[21]  Christopher D. Manning,et al.  Deep Reinforcement Learning for Mention-Ranking Coreference Models , 2016, EMNLP.

[22]  Lei Li,et al.  Reinforced Co-Training , 2018, NAACL.

[23]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads , 2016, EMNLP.

[24]  Xin Wang,et al.  Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation , 2018, ECCV.