Hierarchical Attention and Knowledge Matching Networks With Information Enhancement for End-to-End Task-Oriented Dialog Systems

Nowadays, most end-to-end task-oriented dialog systems are based on sequence-to-sequence (Seq2seq), which is an encoder-decoder framework. These systems sometimes produce grammatically correct, but logically incorrect responses. This phenomenon is usually due to information mismatching. To solve this problem, we introduce hierarchical attention and knowledge matching networks with information enhancement (HAMI) for task-oriented dialog systems. It contains a hierarchical attention dialog encoder (HADE) that models dialogs at the word and sentence level separately. HADE can focus on important words in a dialog history and generate context-aware representations. HAMI also contains a unique knowledge matching module to detect entities and interact with a knowledge base (KB). Then, dialog history and KB result representations serve as guidance for system response generation. Finally, HAMI’s loss function is designed with an information regularization term to emphasize the importance of entities. The experimental results show that HAMI improves 9.8% in entity F1 compared with vanilla Seq2seq. HAMI also outperforms state-of-the-art models in both the entity F1 and dialog accuracy metrics.

[1]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[2]  Dilek Z. Hakkani-Tür,et al.  End-to-end joint learning of natural language understanding and dialogue manager , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Christopher D. Manning,et al.  A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue , 2017, EACL.

[7]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[8]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[9]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[10]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[11]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[12]  Julien Perez,et al.  Gated End-to-End Memory Networks , 2016, EACL.

[13]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[14]  Maxine Eskénazi,et al.  Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability , 2017, SIGDIAL Conference.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Geoffrey Zweig,et al.  Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[17]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[18]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[19]  Ali Farhadi,et al.  Query-Reduction Networks for Question Answering , 2016, ICLR.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[22]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[23]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Matthew Henderson,et al.  Discriminative methods for statistical spoken dialogue systems , 2015 .

[26]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[27]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[28]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[29]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[30]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[31]  Jason D. Williams,et al.  The best of both worlds: unifying conventional dialog systems and POMDPs , 2008, INTERSPEECH.

[32]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[33]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[34]  Bowen Zhou,et al.  Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation , 2016, AAAI.

[35]  David Vandyke,et al.  Conditional Generation and Snapshot Learning in Neural Dialogue Systems , 2016, EMNLP.

[36]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.