论文信息 - Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps - 字舞流文

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

A multi-hop question answering (QA) dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data. In our dataset, we introduce the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully design a pipeline and a set of templates when generating a question-answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrate that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.

Akiko Aizawa | Saku Sugawara | Xanh Ho | Anh-Khoa Duong Nguyen | Akiko Aizawa | A. Nguyen | Xanh Ho | Saku Sugawara

[1] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[2] Le Song,et al. Variational Reasoning for Question Answering with Knowledge Graph , 2017, AAAI.

[3] Kentaro Inui,et al. R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason , 2020, ACL.

[4] Jason Weston,et al. Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[5] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[6] Greg Durrett,et al. Understanding Dataset Design Choices for Multi-hop Reasoning , 2019, NAACL.

[7] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[8] Wenhu Chen,et al. HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data , 2020, EMNLP.

[9] Thomas Pellissier Tanon,et al. Question Answering Benchmarks for Wikidata , 2017, SEMWEB.

[10] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[11] J. Ross Quinlan,et al. Learning logical definitions from relations , 1990, Machine Learning.

[12] Stephen Muggleton,et al. Inverse entailment and progol , 1995, New Generation Computing.

[13] Sameer Singh,et al. Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[14] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[15] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[17] Jonathan Berant,et al. The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[18] Ming-Wei Chang,et al. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[19] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[20] Oren Etzioni,et al. Learning First-Order Horn Clauses from Web Text , 2010, EMNLP.

[21] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[22] Fabian M. Suchanek,et al. AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[23] Peter Jansen,et al. What’s in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams , 2016, COLING.

[24] Richard Socher,et al. Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[25] Gerhard Weikum,et al. Automated Template Generation for Question Answering over Knowledge Graphs , 2017, WWW.

[26] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[27] Kentaro Inui,et al. What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.

[28] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[29] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.