Neural Language Modeling for Contextualized Temporal Graph Generation

This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document. Despite the huge success of neural pre-training methods in NLP tasks, its potential for temporal reasoning over event graphs has not been sufficiently explored. Part of the reason is the difficulty in obtaining large training corpora with human-annotated events and temporal links. We address this challenge by using existing IE/NLP tools to automatically generate a large quantity (89,000) of system-produced document-graph pairs, and propose a novel formulation of the contextualized graph generation problem as a sequence-to-sequence mapping task. These strategies enable us to leverage and fine-tune pre-trained language models on the system-induced training data for the graph generation task. Our experiments show that our approach is highly effective in generating structurally and semantically valid graphs. Further, evaluation on a challenging hand-labeled, out-domain corpus shows that our method outperforms the closest existing method by a large margin on several metrics. Code and pre-trained models are available at this https URL.

[1]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[2]  Helena Ahonen-Myka,et al.  Topic Detection and Tracking with Spatio-Temporal Evidence , 2003, ECIR.

[3]  Jet Hoek,et al.  On Temporality in Discourse Annotation: Theoretical and Practical Considerations , 2017, Dialogue Discourse.

[4]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[5]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[8]  Jean-Yves Ramel,et al.  An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems , 2015, ICPRAM.

[9]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[10]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Mario Vento,et al.  An Improved Algorithm for Matching Large Graphs , 2001 .

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Emden R. Gansner,et al.  Drawing graphs with dot , 2006 .

[15]  James F. Allen,et al.  Interpreting the temporal aspects of language , 2012 .

[16]  Jure Leskovec,et al.  GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.

[17]  Dan Roth,et al.  An Improved Neural Baseline for Temporal Relation Extraction , 2019, EMNLP.

[18]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[19]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[21]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[22]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Nanyun Peng,et al.  TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions , 2020, EMNLP.

[24]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[25]  Daniel S. Weld,et al.  Temporal Information Extraction , 2010, AAAI.

[26]  Yaser Al-Onaizan,et al.  Severing the Edge between before and after: Neural Architectures for Temporal Ordering of Events , 2020, EMNLP.

[27]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[28]  Ivan Vulić,et al.  Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems , 2019, EMNLP.

[29]  Taylor Cassidy,et al.  An Annotation Framework for Dense Event Ordering , 2014, ACL.

[30]  Dan Roth,et al.  CogCompTime: A Tool for Understanding Time in Natural Language , 2018, EMNLP.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Chen Lin,et al.  Multilayered temporal modeling for the clinical domain , 2016, J. Am. Medical Informatics Assoc..

[33]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[34]  Benjamin Van Durme,et al.  Fine-Grained Temporal Relation Extraction , 2019, ACL.

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[37]  Philip M. McCarthy,et al.  Using temporal cohesion to predict temporal coherence in narrative and expository texts , 2007, Behavior research methods.

[38]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[39]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[40]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[41]  Dan Roth,et al.  A Structured Learning Approach to Temporal Relation Extraction , 2017, EMNLP.

[42]  Tanya Goyal,et al.  Embedding time expressions for deep temporal ordering models , 2019, ACL.

[43]  Taylor Cassidy,et al.  Dense Event Ordering with a Multi-Pass Architecture , 2014, TACL.

[44]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[45]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[47]  Teruko Mitamura,et al.  Automatic Event Salience Identification , 2018, EMNLP.

[48]  Hannes Schulz,et al.  Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation , 2017, ArXiv.

[49]  Eduard H. Hovy,et al.  A Typology of Near-Identity Relations for Coreference (NIDENT) , 2010, LREC.

[50]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Event Chains , 2008, ACL.

[51]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.