MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

We propose MultiDoc2Dial, a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. In this work, we aim to address more realistic scenarios where a goaloriented information-seeking conversation involves multiple topics, and hence is grounded on different documents. To facilitate such a task, we introduce a new dataset that contains dialogues grounded in multiple documents from four different domains. We also explore modeling the dialogue-based and documentbased context in the dataset. We present strong baseline approaches and various experimental results, aiming to support further research efforts on such a task.

[1]  Yelong Shen,et al.  Generation-Augmented Retrieval for Open-Domain Question Answering , 2020, ACL.

[2]  Kshitij P. Fadnis,et al.  Doc2Dial: A Framework for Dialogue Composition Grounded in Documents , 2020, AAAI.

[3]  Julian Michael,et al.  AmbigQA: Answering Ambiguous Open-domain Questions , 2020, EMNLP.

[4]  Giuseppe Carenini,et al.  Improving Unsupervised Dialogue Topic Segmentation with Utterance-Pair Coherence Scoring , 2021, SIGDIAL.

[5]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[6]  Jon Ander Campos,et al.  DoQA - Accessing Domain-Specific FAQs via Conversational QA , 2020, ACL.

[7]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[8]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[9]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[10]  Andrew Trotman,et al.  Improvements to BM25 and Language Models Examined , 2014, ADCS.

[11]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[12]  W. Bruce Croft,et al.  Open-Retrieval Conversational Question Answering , 2020, SIGIR.

[13]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[14]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[15]  Seongho Joe,et al.  Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[16]  Wenhan Xiong,et al.  Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval , 2020, International Conference on Learning Representations.

[17]  Christopher Potts,et al.  Relevance-guided Supervision for OpenQA with ColBERT , 2020, Transactions of the Association for Computational Linguistics.

[18]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[19]  Liqiang Nie,et al.  A Graph-guided Multi-round Retrieval Method for Conversational Open-domain Question Answering , 2021, ArXiv.

[20]  Sachindra Joshi,et al.  Does Structure Matter? Encoding Documents for Machine Reading Comprehension , 2021, NAACL.

[21]  Qiang Zhou,et al.  Topic Segmentation for Dialogue Stream , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[22]  Luis A. Lastras,et al.  Doc2Dial: A Goal-Oriented Document-Grounded Dialogue Dataset , 2020, EMNLP.

[23]  Zhiyuan Liu,et al.  Few-Shot Conversational Dense Retrieval , 2021, SIGIR.

[24]  Edouard Grave,et al.  Distilling Knowledge from Reader to Retriever for Question Answering , 2020, ArXiv.

[25]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[26]  W. Bruce Croft,et al.  Weakly-Supervised Open-Retrieval Conversational Question Answering , 2021, ECIR.

[27]  Johanna D. Moore,et al.  Automatic Segmentation of Multiparty Dialogue , 2006, EACL.

[28]  Soujanya Poria,et al.  Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering , 2021, ArXiv.

[29]  Cristina Ioana Muntean,et al.  Topic Propagation in Conversational Search , 2020, SIGIR.

[30]  Shafiq R. Joty,et al.  Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading , 2020, EMNLP.

[31]  Guillaume Bouchard,et al.  Interpretation of Natural Language Rules in Conversational Machine Reading , 2018, EMNLP.

[32]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[33]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.