Multi-Referenced Training for Dialogue Response Generation

In open-domain dialogue response generation, a dialogue context can be continued with diverse responses, and the dialogue models should capture such one-to-many relations. In this work, we first analyze the training objective of dialogue models from the view of Kullback-Leibler divergence (KLD) and show that the gap between the real world probability distribution and the single-referenced data’s probability distribution prevents the model from learning the one-to-many relations efficiently. Then we explore approaches to multi-referenced training in two aspects. Data-wise, we generate diverse pseudo references from a powerful pretrained model to build multi-referenced data that provides a better approximation of the real-world distribution. Model-wise, we propose to equip variational models with an expressive prior, named linear Gaussian model (LGM). Experimental results of automated evaluation and human evaluation show that the methods yield significant improvements over baselines.

[1]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[2]  Tom M. Mitchell,et al.  Learning Data Manipulation for Augmentation and Weighting , 2019, NeurIPS.

[3]  Tatsuya Kawahara,et al.  Designing Precise and Robust Dialogue Response Evaluators , 2020, ACL.

[4]  Feng Ji,et al.  Teacher-Student Framework Enhanced Multi-domain Dialogue Generation , 2019, ArXiv.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Hua Wu,et al.  Generating Multiple Diverse Responses with Multi-Mapping and Posterior Mapping Selection , 2019, IJCAI.

[7]  Dongyan Zhao,et al.  Are Training Samples Correlated? Learning to Generate Dialogue Responses with Multiple References , 2019, ACL.

[8]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[9]  Shuming Shi,et al.  Generating Multiple Diverse Responses for Short-Text Conversation , 2018, AAAI.

[10]  Mohit Bansal,et al.  Automatically Learning Data Augmentation Policies for Dialogue Tasks , 2019, EMNLP.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Dongyan Zhao,et al.  Learning to Converse with Noisy Data: Generation with Calibration , 2018, IJCAI.

[13]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[14]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[15]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[16]  Xueqi Cheng,et al.  Tailored Sequence to Sequence Models to Different Conversation Scenarios , 2018, ACL.

[17]  Thomas Wolf,et al.  TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Di He,et al.  Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.

[20]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[21]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[22]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[23]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[24]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[25]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[26]  Xiaodong Gu,et al.  DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder , 2018, ICLR.

[27]  Bo Chen,et al.  Mechanism-Aware Neural Machine for Dialogue Response Generation , 2017, AAAI.

[28]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[29]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[30]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[31]  Renjie Zheng,et al.  Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation , 2018, EMNLP.

[32]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[33]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[34]  Richard Csaky,et al.  Improving Neural Conversational Models with Entropy-Based Data Filtering , 2019, ACL.

[35]  Stephen Clark,et al.  Scalable Syntax-Aware Language Models Using Knowledge Distillation , 2019, ACL.

[36]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.