TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions

A critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated. However, current machine reading comprehension benchmarks have practically no questions that test temporal phenomena, so systems trained on these benchmarks have no capacity to answer questions such as "what happened before/after [some event]?" We introduce TORQUE, a new English reading comprehension benchmark built on 3.2k news snippets with 21k human-generated questions querying temporal relationships. Results show that RoBERTa-large achieves an exact-match score of 51% on the test set of TORQUE, about 30% behind human performance.

[1]  James H. Martin,et al.  Timelines from Text: Identification of Syntactic Temporal Relations , 2007, International Conference on Semantic Computing (ICSC 2007).

[2]  Dan Roth,et al.  An Improved Neural Baseline for Temporal Relation Extraction , 2019, EMNLP.

[3]  Jonathan Berant,et al.  On Making Reading Comprehension More Comprehensive , 2019, EMNLP.

[4]  Noah A. Smith,et al.  Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning , 2019, EMNLP.

[5]  Dan Roth,et al.  Temporal Common Sense Acquisition with Minimal Supervision , 2020, ACL.

[6]  Ido Dagan,et al.  Crowdsourcing Question-Answer Meaning Representations , 2017, NAACL.

[7]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[8]  Anna Rumshisky,et al.  Context-Aware Neural Model for Temporal Information Extraction , 2018, ACL.

[9]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[10]  James Pustejovsky,et al.  SemEval-2017 Task 12: Clinical TempEval , 2017, *SEMEVAL.

[11]  Yoav Goldberg,et al.  Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.

[12]  Kevin Lin,et al.  Reasoning Over Paragraph Effects in Situations , 2019, MRQA@EMNLP.

[13]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[14]  Yusuke Miyao,et al.  Classifying Temporal Relations by Bidirectional LSTM over Dependency Paths , 2017, ACL.

[15]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[16]  Dan Roth,et al.  Incidental Supervision from Question-Answering Signals , 2019, ArXiv.

[17]  Dan Roth,et al.  Joint Inference for Event Timeline Construction , 2012, EMNLP.

[18]  Martha Palmer,et al.  Richer Event Description: Integrating event coreference with temporal, causal and bridging annotation , 2016 .

[19]  Hao Wu,et al.  A Multi-Axis Annotation Scheme for Event Temporal Relations , 2018, ACL.

[20]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[21]  Reut Tsarfaty,et al.  Evaluating NLP Models via Contrast Sets , 2020, ArXiv.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[24]  Nathanael Chambers,et al.  CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures , 2016, EVENTS@HLT-NAACL.

[25]  Marie-Francine Moens,et al.  Structured Learning for Temporal Relation Extraction from Clinical Records , 2017, EACL.

[26]  Yukari Yamakawa,et al.  Event Nugget Annotation: Processes and Issues , 2015, EVENTS@HLP-NAACL.

[27]  Dan Roth,et al.  A Structured Learning Approach to Temporal Relation Extraction , 2017, EMNLP.

[28]  James Pustejovsky,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[29]  Taylor Cassidy,et al.  Dense Event Ordering with a Multi-Pass Architecture , 2014, TACL.

[30]  Chen Lin,et al.  Neural Temporal Relation Extraction , 2017, EACL.

[31]  Dan Roth,et al.  “Going on a vacation” takes longer than “Going for a walk”: A Study of Temporal Commonsense Understanding , 2019, EMNLP.

[32]  Carmen DeNavas-Walt,et al.  Income and Poverty in the United States: 2013 , 2014 .

[33]  Noah A. Smith,et al.  Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[34]  Olivier Ferret,et al.  Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers , 2017, ACL.

[35]  Luke S. Zettlemoyer,et al.  Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language , 2015, EMNLP.

[36]  James Pustejovsky,et al.  SemEval-2015 Task 5: QA TempEval - Evaluating Temporal Information Understanding with Question Answering , 2015, *SEMEVAL.

[37]  Hao Wu,et al.  Joint Reasoning for Temporal and Causal Relations , 2018, ACL.

[38]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[39]  Egoitz Laparra,et al.  SemEval 2018 Task 6: Parsing Time Normalizations , 2018, *SEMEVAL.

[40]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[41]  Chen Lin,et al.  Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks , 2017, BioNLP.

[42]  Eduardo Blanco,et al.  Determining Event Durations: Models and Error Analysis , 2018, NAACL.

[43]  Chen Lin,et al.  Temporal Annotation in the Clinical Domain , 2014, TACL.

[44]  Marie-Francine Moens,et al.  Temporal Information Extraction by Predicting Relative Time-lines , 2018, EMNLP.

[45]  Taylor Cassidy,et al.  An Annotation Framework for Dense Event Ordering , 2014, ACL.