Principles for Developing a Knowledge Graph of Interlinked Events from News Headlines on Twitter

The ever-growing datasets published on Linked Open Data mainly contain encyclopedic information. However, there is a lack of quality structured and semantically annotated datasets extracted from unstructured real-time sources. In this paper, we present principles for developing a knowledge graph of interlinked events using the case study of news headlines published on Twitter which is a real-time and eventful source of fresh information. We represent the essential pipeline containing the required tasks ranging from choosing background data model, event annotation (i.e., event recognition and classification), entity annotation and eventually interlinking events. The state-of-the-art is limited to domain-specific scenarios for recognizing and classifying events, whereas this paper plays the role of a domain-agnostic road-map for developing a knowledge graph of interlinked events.

[1]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[2]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[3]  Eero Hyvönen,et al.  An Event-Based Approach for Semantic Metadata Interoperability , 2007, ISWC/ASWC.

[4]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[5]  Yang Jin,et al.  Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE , 2005, ACL.

[6]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[7]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[8]  Véronique Malaisé,et al.  Design and use of the Simple Event Model (SEM) , 2011, J. Web Semant..

[9]  Sören Auer,et al.  AGDISTIS - Agnostic Disambiguation of Named Entities Using Linked Open Data , 2014, ECAI.

[10]  Amit P. Sheth,et al.  Don't like RDF reification?: making statements about statements using singleton property , 2014, WWW.

[11]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[12]  Martin Doerr,et al.  The CIDOC Conceptual Reference Module: An Ontological Approach to Semantic Interoperability of Metadata , 2003, AI Mag..

[13]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[14]  Antske Fokkens,et al.  Building event-centric knowledge graphs from news , 2016, J. Web Semant..

[15]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[16]  Sebastian Hellmann,et al.  Real-Time RDF Extraction from Unstructured Data Streams , 2013, SEMWEB.

[17]  Jian Su,et al.  Protein-Protein Interaction Extraction: A Supervised Learning Approach} , 2005 .

[18]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[19]  Steffen Staab,et al.  COMM: Designing a Well-Founded Multimedia Ontology for the Web , 2007, ISWC/ASWC.

[20]  Ramesh Jain,et al.  Toward a Common Event Model for Multimedia Applications , 2007, IEEE MultiMedia.

[21]  Jun'ichi Tsujii,et al.  Evaluating contributions of natural language parsers to protein–protein interaction extraction , 2008, Bioinform..

[22]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[23]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[24]  Valter Crescenzi,et al.  Web-Scale Extension of RDF Knowledge Bases from Templated Websites , 2014, International Semantic Web Conference.

[25]  Cheng Li,et al.  Related Event Discovery , 2017, WSDM.

[26]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[27]  Jens Lehmann,et al.  LinkedGeoData: Adding a Spatial Dimension to the Web of Data , 2009, SEMWEB.

[28]  Amit P. Sheth,et al.  Implicit Entity Linking in Tweets , 2016, ESWC.