TextGraphs 2019 Shared Task on Multi-Hop Inference for Explanation Regeneration

While automated question answering systems are increasingly able to retrieve answers to natural language questions, their ability to generate detailed human-readable explanations for their answers is still quite limited. The Shared Task on Multi-Hop Inference for Explanation Regeneration tasks participants with regenerating detailed gold explanations for standardized elementary science exam questions by selecting facts from a knowledge base of semi-structured tables. Each explanation contains between 1 and 16 interconnected facts that form an “explanation graph” spanning core scientific knowledge and detailed world knowledge. It is expected that successfully combining these facts to generate detailed explanations will require advancing methods in multi-hop inference and information combination, and will make use of the supervised training data provided by the WorldTree explanation corpus. The top-performing system achieved a mean average precision (MAP) of 0.56, substantially advancing the state-of-the-art over a baseline information retrieval model. Detailed extended analyses of all submitted systems showed large relative improvements in accessing the most challenging multi-hop inference problems, while absolute performance remains low, highlighting the difficulty of generating detailed explanations through multi-hop reasoning.

[1]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[2]  Dmitry Ustalov,et al.  RUSSE'2018: A Shared Task on Word Sense Induction for the Russian Language , 2018, ArXiv.

[3]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[4]  Mihai Surdeanu,et al.  Higher-order Lexical Semantic Models for Non-factoid Answer Reranking , 2015, TACL.

[5]  Sören Auer,et al.  Team SVMrank: Leveraging Feature-rich Support Vector Machines for Ranking Explanations to Elementary Science Questions , 2019, TextGraphs@EMNLP.

[6]  Peter Jansen,et al.  Multi-hop Inference for Sentence-level TextGraphs: How Challenging is Meaningfully Combining Information for Science Question Answering? , 2018, TextGraphs@NAACL-HLT.

[7]  Deng Cai,et al.  MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension , 2017, ArXiv.

[8]  Peter Jansen,et al.  Framing QA as Building and Ranking Intersentence Answer Justifications , 2017, CL.

[9]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[10]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[11]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[12]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[13]  Rajarshi Das,et al.  Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference , 2019, EMNLP.

[14]  Greg Durrett,et al.  Understanding Dataset Design Choices for Multi-hop Reasoning , 2019, NAACL.

[15]  Johanna D. Moore,et al.  Explanations in knowledge systems: design for explainable expert systems , 1991, IEEE Expert.

[16]  Gerhard Weikum,et al.  QUINT: Interpretable Question Answering over Knowledge Bases , 2017, EMNLP.

[17]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[18]  Sebastian Riedel,et al.  Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[19]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[20]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[21]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[22]  Pratyay Banerjee ASU at TextGraphs 2019 Shared Task: Explanation ReGeneration using Language Models and Iterative Re-Ranking , 2019, TextGraphs@EMNLP.

[23]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[24]  Sameer Singh,et al.  Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Jianfei Cai,et al.  VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions , 2018, ECCV.

[27]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[28]  Peter Jansen,et al.  What’s in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams , 2016, COLING.

[29]  Dan Roth,et al.  On the Capabilities and Limitations of Reasoning for Natural Language Understanding , 2019, ArXiv.

[30]  Clayton T. Morrison,et al.  WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference , 2018, LREC.