Unsupervised Learning of KB Queries in Task Oriented Dialogs

Task-oriented dialog (TOD) systems converse with users to accomplish a specific task. This task requires the system to query a knowledge base (KB) and use the retrieved results to fulfil user needs. Predicting the KB queries is crucial and can lead to severe under-performance if made incorrectly. KB queries are usually annotated in real-world datasets and are learnt using supervised approaches to achieve acceptable task completion. This need for query annotations prevents TOD systems from easily adapting to new domains. In this paper, we propose a novel problem of learning end-to-end TOD systems using dialogs that do not contain KB query annotations. Our approach first learns to predict the KB queries using reinforcement learning (RL) and then learns the end-to-end system using the predicted queries. However, predicting the correct query in TOD systems is uniquely plagued by correlated attributes, in which, due to data bias, certain attributes always occur together in the KB. This prevents the RL system to generalise and accuracy suffers as a result. We propose Correlated Attributes Resilient RL (CARRL), a modification to the RL gradient estimation, which mitigates the problem of correlated attributes and predicts KB queries better than existing weakly supervised approaches. Finally, we compare the performance of our end-to-end system trained using predicted queries to a system trained using annotated gold queries.

[1]  David Vandyke,et al.  Conditional Generation and Snapshot Learning in Neural Dialogue Systems , 2016, EMNLP.

[2]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[3]  Richard Socher,et al.  Global-to-local Memory Pointer Networks for Task-Oriented Dialogue , 2019, ICLR.

[4]  Ming Zhou,et al.  Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base , 2018, NeurIPS.

[5]  Geoffrey Zweig,et al.  Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[6]  Nikhil Gupta,et al.  Disentangling Language and Knowledge in Task-Oriented Dialogs , 2018, NAACL.

[7]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Jianfeng Gao,et al.  Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.

[10]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[11]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[12]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[13]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[14]  Danish Contractor,et al.  2019 Formatting Instructions for Authors Using LaTeX , 2018 .

[15]  Hsien-Te Cheng,et al.  Algorithms for partially observable markov decision processes , 1989 .

[16]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[17]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[18]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[19]  Ming-Wei Chang,et al.  Search-based Neural Structured Learning for Sequential Question Answering , 2017, ACL.

[20]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21]  Chen Liang,et al.  Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[22]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[23]  YoungSteve,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007 .

[24]  Chen Liang,et al.  Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing , 2018, NeurIPS.