Is the Understanding of Explicit Discourse Relations Required in Machine Reading Comprehension?

An in-depth analysis of the level of language understanding required by existing Machine Reading Comprehension (MRC) benchmarks can provide insight into the reading capabilities of machines. In this paper, we propose an ablation-based methodology to assess the extent to which MRC datasets evaluate the understanding of explicit discourse relations. We define seven MRC skills which require the understanding of different discourse relations. We then introduce ablation methods that verify whether these skills are required to succeed on a dataset. By observing the drop in performance of neural MRC models evaluated on the original and the modified dataset, we can measure to what degree the dataset requires these skills, in order to be understood correctly. Experiments on three large-scale datasets with the BERT-base and ALBERT-xxlarge model show that the relative changes for all skills are small (less than 6%). These results imply that most of the answered questions in the examined datasets do not require understanding the discourse structure of the text. To specifically probe for natural language understanding, there is a need to design more challenging benchmarks that can correctly evaluate the intended skills.

[1]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[2]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[3]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[4]  Goran Nenadic,et al.  A Framework for Evaluation of Machine Reading Comprehension Gold Standards , 2020, LREC.

[5]  Yassine Benajiba,et al.  Overview of QA4MRE Main Task at CLEF 2013 , 2013, CLEF.

[6]  Mohit Bansal,et al.  Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA , 2019, ACL.

[7]  Richard Socher,et al.  Efficient and Robust Question Answering from Minimal Context over Documents , 2018, ACL.

[8]  Greg Durrett,et al.  Understanding Dataset Design Choices for Multi-hop Reasoning , 2019, NAACL.

[9]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[10]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Kentaro Inui,et al.  Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets , 2019, AAAI.

[13]  Ankur Taly,et al.  Did the Model Understand the Question? , 2018, ACL.

[14]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[15]  Kentaro Inui,et al.  What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.

[16]  Sameer Singh,et al.  Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[17]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.