Severing the Edge between before and after: Neural Architectures for Temporal Ordering of Events

In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations. Our proposed models receive a pair of events within a span of text as input and they identify temporal relations (Before, After, Equal, Vague) between them. Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrained representations (i.e. RoBERTa, BERT or ELMo), transfer and multi-task learning (by leveraging complementary datasets), and self-training techniques. Experiments on the MATRES dataset of English documents establish a new state-of-the-art on this task.

[1]  Hao Wu,et al.  A Multi-Axis Annotation Scheme for Event Temporal Relations , 2018, ACL.

[2]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[3]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[4]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[5]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[6]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.

[7]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[8]  Eliyahu Kiperwasser,et al.  Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[9]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[10]  Claire Cardie,et al.  Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees , 2017, ACL.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Tanya Goyal,et al.  Embedding time expressions for deep temporal ordering models , 2019, ACL.

[13]  Clare R. Voss,et al.  Cross-lingual Structure Transfer for Relation and Event Extraction , 2019, EMNLP.

[14]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[15]  Heng Ji,et al.  Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[16]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[17]  Vipin Chaudhary,et al.  Attention Neural Model for Temporal Relation Extraction , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Ralph Grishman,et al.  Relation Extraction: Perspective from Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[20]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[21]  Dan Roth,et al.  An Improved Neural Baseline for Temporal Relation Extraction , 2019, EMNLP.