Towards Enriching Responses with Crowd-sourced Knowledge for Task-oriented Dialogue

Task-oriented dialogue agents are built to assist users in completing various tasks. Generating appropriate responses for satisfactory task completion is the ultimate goal. Hence, as a convenient and straightforward way, metrics such as success rate, inform rate etc., have been widely leveraged to evaluate the generated responses. However, beyond task completion, there are several other factors that largely affect user satisfaction, which remain under-explored. In this work, we focus on analyzing different agent behavior patterns that lead to higher user satisfaction scores. Based on the findings, we design a neural response generation model EnRG. It naturally combines the power of pre-trained GPT-2 in response semantic modeling and the merit of dual attention in making use of the external crowd-sourced knowledge. Equipped with two gates via explicit dialogue act modeling, it effectively controls the usage of external knowledge sources in the form of both text and image. We conduct extensive experiments. Both automatic and human evaluation results demonstrate that, beyond comparable task completion, our proposed method manages to generate responses gaining higher user satisfaction.

[1]  Antske Fokkens,et al.  A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence , 2020, Computer.

[2]  M. de Rijke,et al.  Diversifying Task-oriented Dialogue Response Generation with Prototype Guided Paraphrasing. , 2020 .

[3]  Samuel S. Monfort,et al.  Almost human: Anthropomorphism increases trust resilience in cognitive agents. , 2016, Journal of experimental psychology. Applied.

[4]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  A. Stolcke,et al.  A Study of Multimodal Addressee Detection in Human-Human-Computer Interaction , 2015, IEEE Transactions on Multimedia.

[10]  Dana Kulic,et al.  Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots , 2009, Int. J. Soc. Robotics.

[11]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[12]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[13]  G. Rizzolatti,et al.  The mirror-neuron system. , 2004, Annual review of neuroscience.

[14]  M. de Rijke,et al.  Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss , 2019, WWW.

[15]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[16]  Cynthia Breazeal Regulation and Entrainment in Human-Robot Interaction , 2000, ISER.

[17]  L. Tickle-Degnen,et al.  The Nature of Rapport and Its Nonverbal Correlates , 1990 .

[18]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[19]  Qi Tian,et al.  Multimodal Dialog System: Generating Responses via Adaptive Decoders , 2019, ACM Multimedia.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Tat-Seng Chua,et al.  Knowledge-aware Multimodal Dialogue Systems , 2018, ACM Multimedia.

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  L. Nygaard,et al.  Vocal alignment to native and non-native speakers of English. , 2018, The Journal of the Acoustical Society of America.

[24]  Tat-Seng Chua,et al.  MMConv: An Environment for Multimodal Conversational Search across Multiple Domains , 2021, SIGIR.

[25]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[26]  Xu Sun,et al.  Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation , 2018, EMNLP.

[27]  Jonathan Levy,et al.  Oxytocin selectively modulates brain response to stimuli probing social synchrony , 2016, NeuroImage.

[28]  Zhifang Sui,et al.  Table-to-text Generation by Structure-aware Seq2seq Learning , 2017, AAAI.

[29]  Cynthia Breazeal,et al.  Regulation and Entrainment in Human—Robot Interaction , 2000, Int. J. Robotics Res..

[30]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[31]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[32]  Richard Socher,et al.  Global-to-local Memory Pointer Networks for Task-Oriented Dialogue , 2019, ICLR.

[33]  Ming-Wei Chang,et al.  A Knowledge-Grounded Neural Conversation Model , 2017, AAAI.

[34]  Yunjie Gu,et al.  Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System , 2020, ICLR.

[35]  Kai Wang,et al.  Multi-Domain Dialogue Acts and Response Co-Generation , 2020, ACL.

[36]  Heather Pon-Barry,et al.  Acoustic-Prosodic Entrainment and Rapport in Collaborative Learning Dialogues , 2014, MLA@ICMI.

[37]  Jiahuan Pei,et al.  A Modular Task-oriented Dialogue System Using a Neural Mixture-of-Experts , 2019, ArXiv.

[38]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[39]  Mihail Eric,et al.  MultiWOZ 2. , 2019 .

[40]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[41]  Bart de Boer,et al.  Introducing Parselmouth: A Python interface to Praat , 2018, J. Phonetics.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Zhiyu Chen,et al.  Few-shot NLG with Pre-trained Language Model , 2020, ACL.

[44]  Maxine Eskénazi,et al.  Structured Fusion Networks for Dialog , 2019, SIGdial.

[45]  Justin Scott Giboney,et al.  The impact of chatbot conversational skill on engagement and perceived humanness , 2020, J. Manag. Inf. Syst..

[46]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[47]  Mihir Kale,et al.  Template Guided Text Generation for Task Oriented Dialogue , 2020, EMNLP.

[48]  Wenhu Chen,et al.  Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention , 2019, ACL.

[49]  Richard Socher,et al.  A Simple Language Model for Task-Oriented Dialogue , 2020, NeurIPS.

[50]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[51]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.