WikipEvent: Leveraging Wikipedia Edit History for Event Detection

Much of existing work in information extraction assumes the static nature of relationships in fixed knowledge bases. However, in collaborative environments such as Wikipedia, information and structures are highly dynamic over time. In this work, we introduce a new method to extract complex event structures from Wikipedia. We propose a new model to represent events by engaging multiple entities, generalizable to an arbitrary language. The evolution of an event is captured effectively based on analyzing the user edits history in Wikipedia. Our work provides a foundation for a novel class of evolution-aware entity-based enrichment algorithms, and considerably increases the quality of entity accessibility and temporal retrieval for Wikipedia. We formalize this problem and introduce an efficient end-to-end platform as a solution. We conduct comprehensive experiments on a real dataset of \(1.8 \ million\) Wikipedia articles to show the effectiveness of our proposed solution. Our results demonstrate that we are able to achieve a precision of 70% when evaluated using manually annotated data. Finally, we make a comparative analysis of our work with the well established Current Event Portal of Wikipedia and find that our system WikipEvent using Co-References method can be used in a complementary way to deliver new and more information about events.

[1]  Cong Yu,et al.  Dynamic relationship and event discovery , 2011, WSDM '11.

[2]  Oliver Ferschke,et al.  Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia’s Edit History , 2011, ACL.

[3]  Edgar Meij,et al.  OpenGeist: Insight in the Stream of Page Views on Wikipedia , 2012 .

[4]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[5]  Ee-Peng Lim,et al.  Analyzing feature trajectories for event detection , 2007, SIGIR.

[6]  Tuan Tran Exploiting temporal topic models in social media retrieval , 2012, SIGIR '12.

[7]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[8]  Miles Efron,et al.  Estimation methods for ranking recent information , 2011, SIGIR.

[9]  Avare Stewart,et al.  Unsupervised public health event detection for epidemic intelligence , 2010, CIKM.

[10]  Wolfgang Nejdl,et al.  Extracting Event-Related Information from Article Updates in Wikipedia , 2013, ECIR.

[11]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[12]  Marco Fisichella,et al.  Towards an Entity-Based Automatic Event Validation , 2014, ECIR.

[13]  Mihai Georgescu,et al.  Information Evolution in Wikipedia , 2014, OpenSym.

[14]  Gerhard Weikum,et al.  KORE: keyphrase overlap relatedness for entity disambiguation , 2012, CIKM.

[15]  Gerhard Weikum,et al.  CATE: context-aware timeline for entity illustration , 2011, WWW.

[16]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[17]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[18]  Kjetil Nørvåg,et al.  WikiPop: personalized event detection system based on Wikipedia page view statistics , 2010, CIKM '10.

[19]  Katsumi Tanaka,et al.  Evaluating significance of historical entities based on tempo-spatial impacts analysis using Wikipedia link structure , 2011, HT '11.

[20]  Gerhard Weikum,et al.  Extraction of temporal facts and events from Wikipedia , 2012, TempWeb '12.

[21]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[22]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[23]  Michael Gertz,et al.  Temporal Information Retrieval: Challenges and Opportunities , 2011, TWAW.

[24]  Simone Paolo Ponzetto,et al.  Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..