ESTER: A Machine Reading Comprehension Dataset for Reasoning about Event Semantic Relations

Understanding how events are semantically related to each other is the essence of reading comprehension. Recent event-centric reading comprehension datasets focus mostly on event arguments or temporal relations. While these tasks partially evaluate machines’ ability of narrative understanding, human-like reading comprehension requires the capability to process event-based information beyond arguments and temporal reasoning. For example, to understand causality between events, we need to infer motivation or purpose; to establish event hierarchy, we need to understand the composition of events. To facilitate these tasks, we introduce **ESTER**, a comprehensive machine reading comprehension (MRC) dataset for Event Semantic Relation Reasoning. The dataset leverages natural language queries to reason about the five most common event semantic relations, provides more than 6K questions, and captures 10.1K event relation pairs. Experimental results show that the current SOTA systems achieve 22.1%, 63.3% and 83.5% for token-based exact-match (**EM**), **F1** and event-based **HIT@1** scores, which are all significantly below human performances (36.0%, 79.6%, 100% respectively), highlighting our dataset as a challenging benchmark.

[1]  Jiawei Han,et al.  Document-Level Event Argument Extraction by Conditional Generation , 2021, NAACL.

[2]  Stefano Soatto,et al.  Structured Prediction as Translation between Augmented Natural Languages , 2021, ICLR.

[3]  Wenlin Yao,et al.  Weakly Supervised Subevent Knowledge Acquisition , 2020, EMNLP.

[4]  Jian Liu,et al.  Event Extraction as Machine Reading Comprehension , 2020, EMNLP.

[5]  Hao Wu,et al.  Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ , 2020, EMNLP.

[6]  Hannaneh Hajishirzi,et al.  UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[7]  Nanyun Peng,et al.  TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions , 2020, EMNLP.

[8]  Claire Cardie,et al.  Event Extraction by Answering (Almost) Natural Questions , 2020, EMNLP.

[9]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[10]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[11]  Dan Roth,et al.  “Going on a vacation” takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding , 2019, EMNLP.

[12]  Yejin Choi,et al.  Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning , 2019, EMNLP.

[13]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[14]  Tommaso Caselli,et al.  The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction , 2017, NEWS@ACL.

[15]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[16]  Martha Palmer,et al.  Richer Event Description: Integrating event coreference with temporal, causal and bridging annotation , 2016 .

[17]  Paramita Mirza,et al.  An Analysis of Causality between Events and its Relation to Temporal Information , 2014, COLING.

[18]  Paramita Mirza,et al.  Annotating Causality in the TempEval-3 Corpus , 2014, EACL 2014.

[19]  Marie-Francine Moens,et al.  HiEve: A Corpus for Extracting Event Hierarchies from News Stories , 2014, LREC.

[20]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[21]  Dan Roth,et al.  Minimally Supervised Event Causality Identification , 2011, EMNLP.

[22]  P. Wolff Representing causation. , 2007, Journal of experimental psychology. General.

[23]  Bob Duckett A Multicultural Dictionary of Literary Terms , 1999 .

[24]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[25]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[26]  William Harmon,et al.  A Handbook to Literature , 1960 .

[27]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[28]  M. O. Lorenz,et al.  Methods of Measuring the Concentration of Wealth , 1905, Publications of the American Statistical Association.

[29]  I. Ntroduction The ACE 2005 ( ACE 05 ) Evaluation Plan Evaluation of the Detection and Recognition of ACE Entities , Values , Temporal Expressions , Relations , and Events 1 , 2022 .