Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations. We present a comprehensive analysis of the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed for these real-world problems. We implement rule-based and popular neural methods and find that there is still a significant performance gap between the best performing model (68.5%) and human readers (96.0%), especiallyon problems that require prior knowledge. We further study the effects of distractor plausibility and data augmentation based on translated relevant datasets for English on model performance. We expect C3 to present great challenges to existing systems as answering 86.8% of questions requires both knowledge within and beyond the accompanying document, and we hope that C3 can serve as a platform to study how to leverage various kinds of prior knowledge to better understand a given written or orally oriented text. C3 is available at https://dataset.org/c3/.

[1]  Sebastian Riedel,et al.  Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[2]  Hai Zhao,et al.  One-shot Learning for Question-Answering in Gaokao History Challenge , 2018, COLING.

[3]  Hossein Nassaji The Relationship between Depth of Vocabulary Knowledge and L2 Learners' Lexical Inferencing Strategy Use and Success , 2004 .

[4]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[5]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[6]  Dan Roth,et al.  Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[7]  Bhavana Dalvi,et al.  Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension , 2018, NAACL.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[10]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[11]  Steffen Leo Hansen Reasoning with a Domain Model , 1993, NODALIDA.

[12]  Ting Liu,et al.  Consensus Attention-based Neural Networks for Chinese Reading Comprehension , 2016, COLING.

[13]  Joyce Yue Chai,et al.  Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches , 2019, ArXiv.

[14]  Jun Zhao,et al.  IJCNLP-2017 Task 5: Multi-choice Question Answering in Examinations , 2017, IJCNLP.

[15]  Xiaodong Liu,et al.  ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension , 2018, ArXiv.

[16]  Wentao Ma,et al.  Dataset for the First Evaluation on Chinese Machine Reading Comprehension , 2018, LREC.

[17]  Martha Palmer,et al.  Challenges of Adding Causation to Richer Event Descriptions , 2014, EVENTS@ACL.

[18]  Lung-Hsiang Wong,et al.  Students' Personal and Social Meaning Making in a Chinese Idiom Mobile Learning Environment , 2010, J. Educ. Technol. Soc..

[19]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[20]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[21]  Xiaoyong Du,et al.  Analogical Reasoning on Chinese Morphological and Semantic Relations , 2018, ACL.

[22]  Yejin Choi,et al.  Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning , 2013, ACL.

[23]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[24]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[25]  Robert Wing Pong Luk,et al.  Computer-assisted learning of Chinese idioms , 1998, J. Comput. Assist. Learn..

[26]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[27]  Peng Li,et al.  Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering , 2016, ArXiv.

[28]  Yu Sun,et al.  ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[29]  Claire Cardie,et al.  DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension , 2019, TACL.

[30]  Claire Cardie,et al.  Improving Machine Reading Comprehension with General Reading Strategies , 2018, NAACL.

[31]  Siddharth Patwardhan,et al.  WatsonPaths: Scenario-Based Question Answering and Inference over Unstructured Information , 2017, AI Mag..

[32]  Martha Palmer,et al.  Richer Event Description: Integrating event coreference with temporal, causal and bridging annotation , 2016 .

[33]  Xiao Zhang,et al.  Medical Exam Question Answering with Large-scale Reading Comprehension , 2018, AAAI.

[34]  F. Xia,et al.  The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0) , 2000 .

[35]  Ying Xie,et al.  Learning Chinese Idioms through iPads. , 2013 .

[36]  Oren Etzioni,et al.  Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions , 2016, AAAI.

[37]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[38]  Wentao Ma,et al.  A Span-Extraction Dataset for Chinese Machine Reading Comprehension , 2019, EMNLP-IJCNLP.

[39]  Zhiyuan Liu,et al.  Automatic Judgment Prediction via Legal Reading Comprehension , 2018, CCL.

[40]  Véronique Hoste,et al.  We Usually Don’t Like Going to the Dentist: Using Common Sense to Detect Irony on Twitter , 2018, CL.

[41]  Simon Ostermann,et al.  SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge , 2018, *SEMEVAL.

[42]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[43]  Nathanael Chambers,et al.  A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories , 2016, ArXiv.

[44]  Rudolf Kadlec,et al.  Embracing data abundance: BookTest Dataset for Reading Comprehension , 2016, ICLR.

[45]  Alexander Yates,et al.  Types of Common-Sense Knowledge Needed for Recognizing Textual Entailment , 2011, ACL.

[46]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[47]  Peter Clark,et al.  Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[48]  Ralph Grishman,et al.  Isolating Domain Dependencies In Natural Language Interfaces , 1983, ANLP.

[49]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[50]  Lenhart K. Schubert Can we derive general world knowledge from texts , 2002 .

[51]  Ji Wu,et al.  Exploiting Sentence Embedding for Medical Question Answering , 2018, AAAI.

[52]  Jun Zhao,et al.  Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks? , 2017, EACL.

[53]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[54]  Douglas B. Lenat,et al.  CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition Bottlenecks , 1986, AI Mag..

[55]  Chris Dyer,et al.  The NarrativeQA Reading Comprehension Challenge , 2017, TACL.

[56]  Ying Zhang,et al.  Background Knowledge and Reading Comprehension , 2011 .

[57]  Ming-Wei Chang,et al.  BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.

[58]  Hai Wang,et al.  Broad Context Language Modeling as Reading Comprehension , 2016, EACL.

[59]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[60]  Yuzhong Qu,et al.  Taking Up the Gaokao Challenge: An Information Retrieval Approach , 2016, IJCAI.

[61]  Guokun Lai,et al.  Large-scale Cloze Test Dataset Created by Teachers , 2017, EMNLP.

[62]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[63]  Douglas Herrmann,et al.  A Taxonomy of Part-Whole Relations , 1987, Cogn. Sci..

[64]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[65]  Minlie Huang,et al.  ChID: A Large-scale Chinese IDiom Dataset for Cloze Test , 2019, ACL.

[66]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[67]  Shiyu Chang,et al.  A Co-Matching Model for Multi-choice Reading Comprehension , 2018, ACL.

[68]  Yuting Lai,et al.  DRCD: a Chinese Machine Reading Comprehension Dataset , 2018, ArXiv.

[69]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[70]  Oren Etzioni,et al.  Machine Reading at the University of Washington , 2010, HLT-NAACL 2010.

[71]  Gong Cheng,et al.  GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level , 2019, EMNLP.

[72]  Xinyan Xiao,et al.  DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications , 2017, QA@ACL.

[73]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.