Evaluation Metrics for Machine Reading Comprehension: Prerequisite Skills and Readability

Knowing the quality of reading comprehension (RC) datasets is important for the development of natural-language understanding systems. In this study, two classes of metrics were adopted for evaluating RC datasets: prerequisite skills and readability. We applied these classes to six existing datasets, including MCTest and SQuAD, and highlighted the characteristics of the datasets according to each metric and the correlation between the two classes. Our dataset analysis suggests that the readability of RC datasets does not directly affect the question difficulty and that it is possible to create an RC dataset that is easy to read but difficult to answer.

[1]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[2]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[3]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[4]  Lucia Specia,et al.  Readability Assessment for Text Simplification , 2010 .

[5]  Akiko Aizawa,et al.  An Analysis of Prerequisite Skills for Reading Comprehension , 2016 .

[6]  Andreas Vlachos,et al.  A Strong Lexical Matching Method for the Machine Comprehension Test , 2015, EMNLP.

[7]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[8]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[9]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[10]  John A. Barnden,et al.  A New Approach to Automated Text Readability Classification based on Concept Indexing with Integrated Part-of-Speech n-gram Features , 2015, RANLP.

[11]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[12]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[13]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[14]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[15]  Alexander Yates,et al.  Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment , 2011, ACL.

[16]  Joseph P. Magliano,et al.  Chapter 9 Toward a Comprehensive Model of Comprehension , 2009 .

[17]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[18]  Lynette Hirschman,et al.  Deep Read: A Reading Comprehension System , 1999, ACL.

[19]  Herbert H. Clark,et al.  Bridging , 1975, TINLAP.

[20]  J Quinonero Candela,et al.  Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment , 2006, Lecture Notes in Computer Science.

[21]  Walter Kintsch,et al.  Information accretion and reduction in text processing: Inferences , 1993 .

[22]  Peter Norvig,et al.  Marker Passing as a Weak Method for Text Inferencing , 1989, Cogn. Sci..

[23]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[24]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[25]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[26]  W. Kintsch The role of knowledge in discourse comprehension: a construction-integration model. , 1988, Psychological review.

[27]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[28]  Ido Dagan,et al.  Building Textual Entailment Specialized Data Sets: a Methodology for Isolating Linguistic Phenomena Relevant to Inference , 2010, LREC.

[29]  T. Trabasso,et al.  Constructing inferences during narrative text comprehension. , 1994, Psychological review.

[30]  David A. McAllester,et al.  Who did What: A Large-Scale Person-Centered Cloze Dataset , 2016, EMNLP.

[31]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[32]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[33]  Dan Roth,et al.  “Ask Not What Textual Entailment Can Do for You...” , 2010, ACL.

[34]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[35]  Akiko Aizawa,et al.  Prerequisite Skills for Reading Comprehension: Multi-Perspective Analysis of MCTest Datasets and Systems , 2017, AAAI.

[36]  W. Kintsch,et al.  Strategies of discourse comprehension , 1983 .

[37]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[38]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[39]  Yassine Benajiba,et al.  Overview of QA4MRE Main Task at CLEF 2013 , 2013, CLEF.