Building event-centric knowledge graphs from news

Knowledge graphs have gained increasing popularity in the past couple of years, thanks to their adoption in everyday search engines. Typically, they consist of fairly static and encyclopedic facts about persons and organizations-e.g.?a celebrity's birth date, occupation and family members-obtained from large repositories such as Freebase or Wikipedia.In this paper, we present a method and tools to automatically build knowledge graphs from news articles. As news articles describe changes in the world through the events they report, we present an approach to create Event-Centric Knowledge Graphs (ECKGs) using state-of-the-art natural language processing and semantic web techniques. Such ECKGs capture long-term developments and histories on hundreds of thousands of entities and are complementary to the static encyclopedic information in traditional knowledge graphs.We describe our event-centric representation schema, the challenges in extracting event information from news, our open source pipeline, and the knowledge graphs we have extracted from four different news corpora: general news (Wikinews), the FIFA world cup, the Global Automotive Industry, and Airbus A380 airplanes. Furthermore, we present an assessment on the accuracy of the pipeline in extracting the triples of the knowledge graphs. Moreover, through an event-centered browser and visualization tool we show how approaching information from news in an event-centric manner can increase the user's understanding of the domain, facilitates the reconstruction of news story lines, and enable to perform exploratory investigation of news hidden facts.

[1]  Egoitz Laparra,et al.  Predicate Matrix: extending SemLink through WordNet mappings , 2014, LREC.

[2]  Véronique Malaisé,et al.  Design and use of the Simple Event Model (SEM) , 2011, J. Web Semant..

[3]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[4]  Oren Etzioni,et al.  Semantic Role Labeling for Open Information Extraction , 2010, HLT-NAACL 2010.

[5]  Piek T. J. M. Vossen,et al.  Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution , 2014, LREC.

[6]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[7]  Xabier Artola,et al.  Big data for Natural Language Processing: A streaming approach , 2015, Knowl. Based Syst..

[8]  Laurent Romary,et al.  International standard for a linguistic annotation framework , 2003, HLT-NAACL 2003.

[9]  Gerhard Weikum,et al.  A Fresh Look on Knowledge Bases: Distilling Named Events from News , 2014, CIKM.

[10]  Luciano Serafini,et al.  The KnowledgeStore: A Storage Framework for Interlinking Unstructured and Structured Knowledge , 2015, Int. J. Semantic Web Inf. Syst..

[11]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[12]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[13]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[14]  Estela Saquete Boró,et al.  TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[15]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[16]  Pierre Nugues,et al.  A High-Performance Syntactic and Semantic Dependency Parser , 2010, COLING.

[17]  Pierre Nugues,et al.  Using Semantic Role Labeling to Extract Events from Wikipedia , 2011, DeRiVE@ISWC.

[18]  Marco Rospocher,et al.  A simple API to the KnowledgeStore , 2014, ISWC Developers Workshop.

[19]  Antske Fokkens,et al.  GAF: A Grounded Annotation Framework for Events , 2013, EVENTS@NAACL-HLT.

[20]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[21]  Luciano Serafini,et al.  Interlinking Unstructured and Structured Knowledge in an Integrated Framework , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[22]  Blaz Fortuna,et al.  Language Processing Infrastructure in the XLike Project , 2014, LREC.

[23]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[24]  German Rigau,et al.  IXA pipeline: Efficient and Ready to Use Multilingual NLP tools , 2014, LREC.

[25]  Egoitz Laparra,et al.  ESO: a Frame based Ontology for Events and Implied Situations , 2015 .

[26]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[27]  Aldo Gangemi,et al.  Knowledge Extraction Based on Discourse Representation Theory and Linguistic Frames , 2012, EKAW.

[28]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[29]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[30]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[31]  Pierre Nugues,et al.  Multilingual Semantic Role Labeling , 2009, CoNLL Shared Task.

[32]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[33]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[34]  Paul T. Groth,et al.  Provenance: An Introduction to PROV , 2013, Provenance.

[35]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[36]  Jen-Shin Hong,et al.  Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling , 2010, Expert Syst. Appl..

[37]  Piek T. J. M. Vossen,et al.  "Bag of Events" Approach to Event Coreference Resolution. Supervised Classification of Event Templates , 2015, Int. J. Comput. Linguistics Appl..

[38]  Chris T. A. Evelo,et al.  Applying linked data approaches to pharmacology: Architectural decisions and implementation , 2014, Semantic Web.

[39]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[40]  Paramita Mirza,et al.  HLT-FBK: a Complete Temporal Processing System for QA TempEval , 2015, *SEMEVAL.

[41]  Dafna Shahaf,et al.  Connecting the dots between news articles , 2011, IJCAI 2011.

[42]  Jim Waldo,et al.  On system design , 2006, OOPSLA '06.

[43]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[44]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[45]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[46]  C. Fellbaum An Electronic Lexical Database , 1998 .

[47]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[48]  Sanda M. Harabagiu,et al.  Using Predicate-Argument Structures for Information Extraction , 2003, ACL.