Automatic Classification and Relationship Extraction for Multi-Lingual and Multi-Granular Events from Wikipedia

Wikipedia is a rich data source for knowledge from all domains. As part of this knowledge, historical and daily events (news) are collected for different languages on special pages and in event portals. As only a small amount of events is available in structured form in DBpedia, we extract these events with a rule-based approach from Wikipedia pages. In this paper we focus on three aspects: (1) extending our prior method for extracting events for a daily granularity, (2) the automatic classification of events and (3) finding relationships between events. As a result, we have extracted a data set of about 170,000 events covering different languages and granularities. On the basis of one language set, we have automatically built categories for about 70% of the events of another language set. For nearly every event, we have been able to find related events.

[1]  Eero Hyvönen,et al.  Proceedings of the 6th International Semantic Web Conference (ISWC 2007), Busan, Korea, Springer-Verlag, November 11-15, 2007 , 2007 .

[2]  Steffen Staab,et al.  F--a model of events based on the foundational ontology dolce+DnS ultralight , 2009, K-CAP '09.

[3]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[4]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[5]  Kalina Bontcheva,et al.  22nd International Conference on on Computational Linguistics: Demonstration Papers , 2008 .

[6]  Gerhard Weikum,et al.  Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia , 2010, EDBT '10.

[7]  Gerhard Weikum,et al.  Extraction of temporal facts and events from Wikipedia , 2012, TempWeb '12.

[8]  Véronique Malaisé,et al.  Design and use of the Simple Event Model (SEM) , 2011, J. Web Semant..

[9]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[10]  Pierre Nugues,et al.  Using Semantic Role Labeling to Extract Events from Wikipedia , 2011, DeRiVE@ISWC.

[11]  Rachel Chasin Event and Temporal Information Extraction towards Timelines of Wikipedia Articles , 2010 .

[12]  Daniel Hienert,et al.  Extraction of Historical Events from Wikipedia , 2012, KNOW@LOD.

[13]  Johannes Fürnkranz,et al.  Unsupervised generation of data mining features from linked open data , 2012, WIMS '12.

[14]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[15]  Ricardo Baeza-Yates,et al.  Proceedings of the 2nd Temporal Web Analytics Workshop , 2012 .

[16]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[17]  Daniel S. Weld,et al.  Temporal Information Extraction , 2010, AAAI.

[18]  James Pustejovsky,et al.  Temporal Processing with the TARSQI Toolkit , 2008, COLING.

[19]  Dunja Mladenic,et al.  Extracting Named Entities and Relating Them over Time Based on Wikipedia , 2007, Informatica.

[20]  Raphaël Troncy,et al.  LODE: Linking Open Descriptions of Events , 2009, ASWC.

[21]  Carlo Strapparava,et al.  Proceedings of the 5th International Workshop on Semantic Evaluation , 2010 .