论文信息 - Towards a Human-like Open-Domain Chatbot - 字舞流文

Towards a Human-like Open-Domain Chatbot

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.

Quoc V. Le | David R. So | Minh-Thang Luong | Noah Fiedel | Daniel De Freitas | Jamie Hall | Romal Thoppilan | Zi Yang | Apoorv Kulshreshtha | Gaurav Nemade | Yifeng Lu | R. Thoppilan

[1] Dongyan Zhao,et al. RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems , 2017, AAAI.

[2] Percy Liang,et al. Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.

[3] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[4] Verena Rieser,et al. Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[5] Nebojsa Jojic,et al. Steering Output Style and Topic in Neural Response Generation , 2017, EMNLP.

[6] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[7] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[8] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.

[9] Alan Ritter,et al. Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[10] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[11] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[12] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[13] Jianfeng Gao,et al. A Persona-Based Neural Conversation Model , 2016, ACL.

[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[15] Junyi Jessy Li,et al. Linguistically-Informed Specificity and Semantic Plausibility for Dialogue Generation , 2019, NAACL.

[16] Klaus Krippendorff,et al. Computing Krippendorff's Alpha-Reliability , 2011 .

[17] Denny Britz,et al. Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models , 2017, EMNLP.

[18] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.

[19] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[20] Joelle Pineau,et al. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.

[21] Sungjin Lee,et al. Jointly Optimizing Diversity and Relevance in Neural Response Generation , 2019, NAACL.

[22] W. R. Ford,et al. Real conversations with artificial intelligence: A comparison between human-human online conversations and human-chatbot conversations , 2015, Comput. Hum. Behav..

[23] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[24] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[25] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[26] Maxine Eskénazi,et al. Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[27] Natasha Jaques,et al. Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems , 2019, NeurIPS.

[28] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[29] Joelle Pineau,et al. A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[30] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[31] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[32] Harry Shum,et al. The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[33] Nan Jiang,et al. LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics , 2018, NAACL.

[34] Jason Weston,et al. Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[35] Jianfeng Gao,et al. DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, Annual Meeting of the Association for Computational Linguistics.

[36] Lav R. Varshney,et al. CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[37] Xiaodong Gu,et al. DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder , 2018, ICLR.

[38] Jason Weston,et al. What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.

[39] Quoc V. Le,et al. A Neural Conversational Model , 2015, ArXiv.

[40] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[41] Percy Liang,et al. The price of debiasing automatic metrics in natural language evalaution , 2018, ACL.

[42] Lihong Li,et al. Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[43] Xueqi Cheng,et al. Learning to Control the Specificity in Neural Response Generation , 2018, ACL.

[44] Hang Li,et al. Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[45] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[46] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[47] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[48] Nanyun Peng,et al. Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.

[49] Wei-Ying Ma,et al. Topic Aware Neural Response Generation , 2016, AAAI.

[50] Chris Callison-Burch,et al. Human and Automatic Detection of Generated Text , 2019, ArXiv.

[51] Zhe Gan,et al. Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization , 2018, NeurIPS.

[52] Thomas Wolf,et al. TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[53] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.

[54] Y-Lan Boureau,et al. I Know the Feeling: Learning to Converse with Empathy , 2018, ArXiv.

[55] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[56] Chris Callison-Burch,et al. Comparison of Diverse Decoding Methods from Conditional Language Models , 2019, ACL.