Transformer Decoder Based Reinforcement Learning Approach for Conversational Response Generation

Developing a machine that can hold an engaging conversation with a human is one of the main challenges in designing a dialogue system in the field of natural language processing. Responses generated by neural conversational models with log-likelihood training methods tend to lack informativeness and diversity. We address the limitation of log-likelihood training in dialogue generation models, and we present the Reinforce Transformer decoder model, our new approach for training the Transformer decoder based conversational model, which incorporates proximal policy optimization techniques from re-inforcement learning with the Transformer decoder architecture. We specifically examine the use of our proposed model for multi-turn dialogue response generation in a real word human to a human dataset. To verify the effectiveness of our proposed framework, we evaluate our model on the Reddit dialogues data, which is a real word human to a human dataset. Experiments show that our proposed response generating model in a dialogue achieves significant improvement over recurrent sequence-to-sequence models and also the state of the art Transformer based dialogue generation models based on diversity and relevance evaluation metrics.

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  Zhe Gan,et al.  Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation , 2018, AAAI.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Ruhi Sarikaya,et al.  Deep belief network based semantic taggers for spoken language understanding , 2013, INTERSPEECH.

[5]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Harry Shum,et al.  From Eliza to XiaoIce: challenges and opportunities with social chatbots , 2018, Frontiers of Information Technology & Electronic Engineering.

[12]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Vasile Rus,et al.  An Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, ITS.

[15]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[16]  Zhe Gan,et al.  Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization , 2018, NeurIPS.

[17]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[18]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[19]  Jason Weston,et al.  Learning Through Dialogue Interactions , 2016, ICLR.

[20]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[22]  Roland Kuhn,et al.  Rule-Based Translation with Statistical Phrase-Based Post-Editing , 2007, WMT@ACL.

[23]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[24]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[25]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[26]  Wei-Ying Ma,et al.  Hierarchical Recurrent Attention Network for Response Generation , 2017, AAAI.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Joelle Pineau,et al.  Bootstrapping Dialog Systems with Word Embeddings , 2014 .

[29]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[31]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[33]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[34]  Gökhan Tür,et al.  Use of kernel deep convex networks and end-to-end learning for spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[35]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[36]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[37]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[38]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[39]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[40]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[41]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[42]  Daniel Jurafsky,et al.  Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.

[43]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.