Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

Designing task-oriented dialogue systems is a challenging research topic, since it needs not only to generate utterances fulfilling user requests but also to guarantee the comprehensibility. Many previous works trained end-to-end (E2E) models with supervised learning (SL), however, the bias in annotated system utterances remains as a bottleneck. Reinforcement learning (RL) deals with the problem through using non-differentiable evaluation metrics (e.g., the success rate) as rewards. Nonetheless, existing works with RL showed that the comprehensibility of generated system utterances could be corrupted when improving the performance on fulfilling user requests. In o gur work, we (1) propose modelling the hierarchical structure between dialogue policy and natural language generator (NLG) with the option framework, called HDNO, where the latent dialogue act is applied to avoid designing specific dialogue act representations; (2) train HDNO via hierarchical reinforcement learning (HRL), as well as suggest the asynchronous updates between dialogue policy and NLG during training to theoretically guarantee their convergence to a local maximizer; and (3) propose using a discriminator modelled with language models as an additional reward to further improve the comprehensibility. We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA, showing improvements on the performance evaluated by automatic evaluation metrics and human evaluation. Finally, we demonstrate the semantic meanings of latent dialogue acts to show the ability of explanation.

[1]  Chong Wang,et al.  Subgoal Discovery for Hierarchical Dialogue Policy Learning , 2018, EMNLP.

[2]  Zhijian Ou,et al.  Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , 2019, AAAI.

[3]  Kai Wang,et al.  Multi-Domain Dialogue Acts and Response Co-Generation , 2020, ACL.

[4]  Heriberto Cuayáhuitl,et al.  Hierarchical Reinforcement Learning for Spoken Dialogue Systems , 2009 .

[5]  Derek Chen,et al.  Decoupling Strategy and Generation in Negotiation Dialogues , 2018, EMNLP.

[6]  Maxine Eskénazi,et al.  Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models , 2019, NAACL.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[11]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[12]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[13]  Jiahuan Pei,et al.  A Modular Task-oriented Dialogue System Using a Neural Mixture-of-Experts , 2019, ArXiv.

[14]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[15]  Mike Lewis,et al.  Hierarchical Text Generation and Planning for Strategic Dialogue , 2017, ICML.

[16]  Kam-Fai Wong,et al.  Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning , 2017, EMNLP.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Jianfeng Gao,et al.  Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.

[19]  Maarten de Rijke,et al.  Retrospective and Prospective Mixture-of-Generators for Task-oriented Dialogue Response Generation , 2020, ECAI.

[20]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[21]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[22]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[25]  Ulrich Berger,et al.  Brown's original fictitious play , 2007, J. Econ. Theory.

[26]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[27]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[28]  Eric P. Xing,et al.  Unsupervised Text Style Transfer using Language Models as Discriminators , 2018, NeurIPS.

[29]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[30]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[31]  Kam-Fai Wong,et al.  Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[32]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[33]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[34]  Wenhu Chen,et al.  Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention , 2019, ACL.

[35]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[36]  Richard Socher,et al.  A Simple Language Model for Task-Oriented Dialogue , 2020, NeurIPS.

[37]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[38]  L. Shapley,et al.  Potential Games , 1994 .

[39]  Jianfeng Gao,et al.  SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model , 2020, ArXiv.

[40]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[41]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[42]  Stefan Ultes,et al.  Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning , 2017, SIGDIAL Conference.

[43]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[44]  Dilek Z. Hakkani-Tür,et al.  MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines , 2019, ArXiv.

[45]  Mihail Eric,et al.  MultiWOZ 2. , 2019 .

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Natasha Jaques,et al.  Hierarchical Reinforcement Learning for Open-Domain Dialog , 2020, AAAI.

[48]  Steve J. Young,et al.  USING POMDPS FOR DIALOG MANAGEMENT , 2006, 2006 IEEE Spoken Language Technology Workshop.

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  Maxine Eskénazi,et al.  Structured Fusion Networks for Dialog , 2019, SIGdial.

[51]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[52]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[53]  Hui Ye,et al.  The Hidden Information State Approach to Dialog Management , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[54]  K. K. Sahu,et al.  Normalization: A Preprocessing Stage , 2015, ArXiv.

[55]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[56]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[57]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[58]  R. Bellman A Markovian Decision Process , 1957 .

[59]  Maxine Eskénazi,et al.  Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[60]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[61]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[62]  Yu Li,et al.  Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models , 2019, EACL.