Why Do Neural Response Generation Models Prefer Universal Replies?

Recent advances in sequence-to-sequence learning reveal a purely data-driven approach to the response generation task. Despite its diverse applications, existing neural models are prone to producing short and generic replies, making it infeasible to tackle open-domain challenges. In this research, we analyze this critical issue in light of the model's optimization goal and the specific characteristics of the human-to-human dialog corpus. By decomposing the black box into parts, a detailed analysis of the probability limit was conducted to reveal the reason behind these universal replies. Based on these analyses, we propose a max-margin ranking regularization term to avoid the models leaning to these replies. Finally, empirical experiments on case studies and benchmarks with several metrics validate this approach.

[1]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[2]  Nebojsa Jojic,et al.  Steering Output Style and Topic in Neural Response Generation , 2017, EMNLP.

[3]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[4]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[5]  Wei-Ying Ma,et al.  Topic Aware Neural Response Generation , 2016, AAAI.

[6]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[7]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Denny Britz,et al.  Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models , 2017, EMNLP.

[10]  Daniel Jurafsky,et al.  Mutual Information and Diverse Decoding Improve Neural Machine Translation , 2016, ArXiv.

[11]  Zhe Gan,et al.  Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization , 2018, NeurIPS.

[12]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[13]  R. Schiffer Psychobiology of Language , 1986 .

[14]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[16]  Daniel Jurafsky,et al.  A Simple, Fast Diverse Decoding Algorithm for Neural Generation , 2016, ArXiv.

[17]  Michael Collins,et al.  Maximum Margin Ranking Algorithms for Information Retrieval , 2010, ECIR.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[20]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[21]  Zaixiang Zheng,et al.  Neural Machine Translation with Word Predictions , 2017, EMNLP.

[22]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[23]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[24]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Rui Yan,et al.  Sequence to Backward and Forward Sequences: A Content-Introducing Approach to Generative Short-Text Conversation , 2016, COLING.

[26]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[27]  Zhoujun Li,et al.  Neural Response Generation with Dynamic Vocabularies , 2017, AAAI.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[30]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[33]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[34]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[35]  Zhoujun Li,et al.  Detecting Context Dependent Messages in a Conversational Environment , 2016, COLING.

[36]  Yang Zhao,et al.  A Conditional Variational Framework for Dialog Generation , 2017, ACL.