Extracting Event-Related Information from Article Updates in Wikipedia

Wikipedia is widely considered the largest and most up-to-date online encyclopedia, with its content being continuously maintained by a supporting community. In many cases, real-life events like new scientific findings, resignations, deaths, or catastrophes serve as triggers for collaborative editing of articles about affected entities such as persons or countries. In this paper, we conduct an in-depth analysis of event-related updates in Wikipedia by examining different indicators for events including language, meta annotations, and update bursts. We then study how these indicators can be employed for automatically detecting event-related updates. Our experiments on event extraction, clustering, and summarization show promising results towards generating entity-specific news tickers and timelines.

[1]  Cristina Ribeiro,et al.  WikiChanges: exposing Wikipedia revision activity , 2008, Int. Sym. Wikis.

[2]  Kjetil Nørvåg,et al.  Exploiting time-based synonyms in searching document archives , 2010, JCDL '10.

[3]  Darren Gergle,et al.  Staying in the loop: structure and dynamics of Wikipedia's breaking news collaborations , 2012, WikiSym '12.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Kjetil Nørvåg,et al.  WikiPop: personalized event detection system based on Wikipedia page view statistics , 2010, CIKM '10.

[6]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[7]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[8]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[9]  Michela Ferron,et al.  Psychological processes underlying Wikipedia representations of natural and manmade disasters , 2012, WikiSym '12.

[10]  Bin Wang,et al.  A probabilistic model for retrospective news event detection , 2005, SIGIR '05.

[11]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[12]  Benno Stein,et al.  Automatic Vandalism Detection in Wikipedia , 2008, ECIR.

[13]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[14]  Avare Stewart,et al.  Unsupervised public health event detection for epidemic intelligence , 2010, CIKM.

[15]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[16]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[17]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[18]  Ee-Peng Lim,et al.  Analyzing feature trajectories for event detection , 2007, SIGIR.

[19]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[21]  Andrew Lih,et al.  Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource , 2004 .

[22]  M. Osborne,et al.  Bieber no more : First Story Detection using Twitter and Wikipedia , 2012 .