Transfer Learning based Task-oriented Dialogue Policy for Multiple Domains using Hierarchical Reinforcement Learning

Development of Virtual Agents (VAs) for Goal/Task-oriented conversations capable of handling complex tasks pertaining to multiple domains and its various intents is quite an onerous task. Lack of high quality, domain specific conversational data required to train policies is one of the biggest challenges for the success of any dialogue system. In this paper, we present a multi-domain, multi-intent based task-oriented dialogue system by successfully combining Hierarchical Deep Reinforcement Learning and Transfer Learning paradigms. The notion is to exploit or take advantage of the resemblance between domains as various domains share considerable amount of overlapping data or slots. Thus, Options framework along with Transfer Learning is employed to curate VAs with better and faster learning performance. Our proposed approach reduced the data requirement to train multi-domain VAs by atleast 20% for distant domains and almost 38% for close domains. It also significantly curtailed the learning time and aided faster learning for transfer learning based policies.

[1]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[2]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[3]  B. L. Welch The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.

[4]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[5]  Chong Wang,et al.  Subgoal Discovery for Hierarchical Dialogue Policy Learning , 2018, EMNLP.

[6]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[7]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[8]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Hannes Schulz,et al.  Frames: a corpus for adding memory to goal-oriented dialogue systems , 2017, SIGDIAL Conference.

[11]  Seunghak Yu,et al.  Scaling up deep reinforcement learning for multi-domain dialogue systems , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[12]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[13]  Kam-Fai Wong,et al.  Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning , 2017, EMNLP.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[16]  Seunghak Yu,et al.  Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates , 2017, INTERSPEECH.

[17]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[18]  Jianfeng Gao,et al.  BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.

[19]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Claudiu Musat,et al.  Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer Learning , 2018, IJCAI.

[24]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[25]  Stefan Ultes,et al.  Feudal Reinforcement Learning for Dialogue Management in Large Domains , 2018, NAACL.