Multi-Sentence Knowledge Selection in Open-Domain Dialogue

Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task, such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate the existing state of open-domain conversation knowledge selection, showing where the existing methodologies regarding data and evaluation are flawed. We then improve on them by proposing a new framework for collecting relevant knowledge, and create an augmented dataset based on the Wizard of Wikipedia (WOW) corpus, which we call WOW++. WOW++ averages 8 relevant knowledge sentences per dialogue context, embracing the inherent ambiguity of open-domain dialogue knowledge selection. We then benchmark various knowledge ranking algorithms on this augmented dataset with both intrinsic evaluation and extrinsic measures of response quality, showing that neural rerankers that use WOW++ can outperform rankers trained on standard datasets.

[1]  Bernard J. Jansen,et al.  Inter-Rater Agreement for Social Computing Studies , 2018, 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[2]  Dongyan Zhao,et al.  Low-Resource Knowledge-Grounded Dialogue Generation , 2020, ICLR.

[3]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[4]  K. Krippendorff Validity in Content Analysis , 1980 .

[5]  William Yang Wang,et al.  Unsupervised Injection of Knowledge into Dialogue Generation via Language Models , 2020, ArXiv.

[6]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[7]  Thomas Wolf,et al.  TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[8]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[9]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[10]  Byeongchang Kim,et al.  Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue , 2020, ICLR.

[11]  Alan W. Black,et al.  A Dataset for Document Grounded Conversations , 2018, EMNLP.

[12]  Dilek Z. Hakkani-Tür,et al.  Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize , 2018, ArXiv.

[13]  Martha Larson,et al.  Truth or Error? Towards systematic analysis of factual errors in abstractive summaries , 2020, EVAL4NLP.

[14]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[15]  Dilek Z. Hakkani-Tür,et al.  Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations , 2019, INTERSPEECH.

[16]  Dilek Z. Hakkani-Tür,et al.  DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue , 2020, ArXiv.

[17]  Further Advances in Open Domain Dialog Systems in the Third Alexa Prize Socialbot Grand Challenge , 2020 .