TextGraphs 2021 Shared Task on Multi-Hop Inference for Explanation Regeneration

The Shared Task on Multi-Hop Inference for Explanation Regeneration asks participants to compose large multi-hop explanations to questions by assembling large chains of facts from a supporting knowledge base. While previous editions of this shared task aimed to evaluate explanatory completeness – finding a set of facts that form a complete inference chain, without gaps, to arrive from question to correct answer, this 2021 instantiation concentrates on the subtask of determining relevance in large multi-hop explanations. To this end, this edition of the shared task makes use of a large set of approximately 250k manual explanatory relevancy ratings that augment the 2020 shared task data. In this summary paper, we describe the details of the explanation regeneration task, the evaluation data, and the participating systems. Additionally, we perform a detailed analysis of participating systems, evaluating various aspects involved in the multi-hop inference process. The best performing system achieved an NDCG of 0.82 on this challenging task, substantially increasing performance over baseline methods by 32%, while also leaving significant room for future improvement.

[1]  Niranjan Balasubramanian,et al.  Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning , 2020, EMNLP.

[2]  Deng Cai,et al.  MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension , 2017, ArXiv.

[3]  Marco Valentino,et al.  A Survey on Explainability in Machine Reading Comprehension , 2020, ArXiv.

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  Mihai Surdeanu,et al.  Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering , 2020, ACL.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Tie-Yan Liu,et al.  A Theoretical Analysis of NDCG Type Ranking Measures , 2013, COLT.

[8]  Clayton T. Morrison,et al.  WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference , 2018, LREC.

[10]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[11]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[12]  Marco Valentino,et al.  Unification-based Reconstruction of Multi-hop Explanations for Science Questions , 2021, EACL.

[13]  Sam Witteveen,et al.  Red Dragon AI at TextGraphs 2020 Shared Task : LIT : LSTM-Interleaved Transformer for Multi-Hop Explanation Ranking , 2020, TEXTGRAPHS.

[14]  Rajarshi Das,et al.  Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference , 2019, EMNLP.

[15]  Greg Durrett,et al.  Understanding Dataset Design Choices for Multi-hop Reasoning , 2019, NAACL.

[16]  Yuejia Xiang,et al.  A Three-step Method for Multi-Hop Inference Explanation Regeneration , 2021 .

[18]  Yu Sun,et al.  PGL at TextGraphs 2020 Shared Task: Explanation Regeneration using Language and Graph Learning Methods , 2020, TEXTGRAPHS.

[19]  Mihai Surdeanu,et al.  Higher-order Lexical Semantic Models for Non-factoid Answer Reranking , 2015, TACL.

[20]  Ana Marasovi'c,et al.  Teach Me to Explain: A Review of Datasets for Explainable NLP , 2021, ArXiv.

[21]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[22]  Dmitry Ustalov,et al.  TextGraphs 2020 Shared Task on Multi-Hop Inference for Explanation Regeneration , 2020, TEXTGRAPHS.

[23]  Martin Andrews,et al.  Textgraphs-15 Shared Task System Description : Multi-Hop Inference Explanation Regeneration by Matching Expert Ratings , 2021, Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15).

[24]  Marie-Francine Moens,et al.  Autoregressive Reasoning over Chains of Facts with Transformers , 2020, COLING.

[25]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[26]  Sameer Singh,et al.  Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[27]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[28]  Dmitry Ustalov,et al.  TextGraphs 2019 Shared Task on Multi-Hop Inference for Explanation Regeneration , 2019, EMNLP.

[29]  Tushar Khot,et al.  QASC: A Dataset for Question Answering via Sentence Composition , 2020, AAAI.

[30]  Zhipeng Luo,et al.  DeepBlueAI at TextGraphs 2021 Shared Task: Treating Multi-Hop Inference Explanation Regeneration as A Ranking Problem , 2021 .

[31]  Sam Witteveen,et al.  Red Dragon AI at TextGraphs 2019 Shared Task: Language Model Assisted Explanation Generation , 2019, TextGraphs@EMNLP.

[32]  Peter A. Jansen A Study of Automatically Acquiring Explanatory Inference Patterns from Corpora of Explanations: Lessons from Elementary Science Exams , 2017, AKBC@NIPS.

[33]  Peter Jansen,et al.  Framing QA as Building and Ranking Intersentence Answer Justifications , 2017, CL.

[34]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[35]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[36]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[37]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[38]  Harsh Jhamtani,et al.  Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering , 2020, EMNLP.

[39]  Peter A. Jansen,et al.  WorldTree V2: A Corpus of Science-Domain Structured Explanations and Inference Patterns supporting Multi-Hop Inference , 2020, LREC.