EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation

We introduce EventNarrative, a knowledge graph-to-text dataset from publicly available open-world knowledge graphs. Given the recent advances in event-driven Information Extraction (IE), and that prior research on graph-to-text only focused on entity-driven KGs, this paper focuses on event-centric data. However, our data generation system can still be adapted to other types of KG data. Existing large-scale datasets in the graph-to-text area are non-parallel, meaning there is a large disconnect between the KGs and text. The datasets that have a paired KG and text, are small scale and manually generated or generated without a rich ontology, making the corresponding graphs sparse. Furthermore, these datasets contain many unlinked entities between their KG and text pairs. EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text, six times larger than the current largest parallel dataset. It makes use of a rich ontology, all the KGs entities are linked to the text, and our manual annotations confirm a high data quality. Our aim is two-fold: to help break new ground in event-centric research where data is lacking and to give researchers a well-defined, large-scale dataset in order to better evaluate existing and future knowledge graph-to-text models. We also evaluate two types of baselines on EventNarrative: a graph-to-text specific model and two state-of-the-art language models, which previous work has shown to be adaptable to the knowledge graph-to-text domain.

[1]  Simon Gottschalk,et al.  EventKG: A Multilingual Event-Centric Temporal Knowledge Graph , 2018, ESWC.

[2]  Jianfeng Gao,et al.  UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.

[3]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[4]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[5]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[6]  Nicolas Usunier,et al.  Canonical Tensor Decomposition for Knowledge Base Completion , 2018, ICML.

[7]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[8]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Anja Jentzsch Linked Open Data Cloud , 2014 .

[11]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[12]  Mohammadreza Armandpour,et al.  ChronoR: Rotation Based Temporal Knowledge Graph Embedding , 2021, AAAI.

[13]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[14]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[15]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[16]  Timothy M. Hospedales,et al.  TuckER: Tensor Factorization for Knowledge Graph Completion , 2019, EMNLP.

[17]  Maja Popovic,et al.  chrF++: words helping character n-grams , 2017, WMT.

[18]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[19]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[22]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[23]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[24]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[25]  Bill Dolan,et al.  Grounded Response Generation Task at DSTC7 , 2019 .

[26]  Heng Ji,et al.  Describing a Knowledge Base , 2018, INLG.

[27]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[28]  Claire Cardie,et al.  Event Extraction by Answering (Almost) Natural Questions , 2020, EMNLP.

[29]  Daisy Zhe Wang,et al.  DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs , 2019, NeurIPS.

[30]  Zheng Zhang,et al.  GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation , 2020, COLING.

[31]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[32]  Simon Gottschalk,et al.  EventKG - the Hub of Event Knowledge on the Web - and Biographical Timeline Generation , 2019, Semantic Web.

[33]  Oriol Vinyals,et al.  WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset , 2021, Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15).

[34]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[36]  Mari Ostendorf,et al.  Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.

[37]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[38]  Shashi Narayan,et al.  Creating Training Corpora for NLG Micro-Planners , 2017, ACL.

[39]  Gerhard Weikum,et al.  YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames , 2016, SEMWEB.

[40]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[41]  N. Norrick Twice-told tales: Collaborative narration of familiar stories , 1997, Language in Society.

[42]  Weinan Zhang,et al.  CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via Cycle Training , 2020, WEBNLG.

[43]  Mirella Lapata,et al.  Text Generation from Knowledge Graphs with Graph Transformers , 2019, NAACL.

[44]  Simon Gottschalk,et al.  HapPenIng: Happen, Predict, Infer - Event Series Completion in a Knowledge Graph , 2019, SEMWEB.

[45]  Jiawei Han,et al.  Document-Level Event Argument Extraction by Conditional Generation , 2021, NAACL.

[46]  Miguel E. Rodríguez,et al.  Temporal Reasoning Over Event Knowledge Graphs , 2018 .

[47]  To all authors , 1995 .

[48]  Sean P. O'Brien,et al.  Crisis Early Warning and Decision Support: Contemporary Approaches and Thoughts on Future Research , 2010 .

[49]  Iryna Gurevych,et al.  Investigating Pretrained Language Models for Graph-to-Text Generation , 2020, ArXiv.

[50]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[51]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.