Information Evolution in Wikipedia

The Web of data is constantly evolving based on the dynamics of its content. Current Web search engine technologies consider static collections and do not factor in explicitly or implicitly available temporal information, that can be leveraged to gain insights into the dynamics of the data. In this paper, we hypothesize that by employing the temporal aspect as the primary means for capturing the evolution of entities, it is possible to provide entity-based accessibility to Web archives. We empirically show that the edit activity on Wikipedia can be exploited to provide evidence of the evolution of Wikipedia pages over time, both in terms of their content and in terms of their temporally defined relationships, classified in literature as events. Finally, we present results from our extensive analysis of a dataset consisting of 31,998 Wikipedia pages describing politicians, and observations from in-depth case studies. Our findings reflect the usefulness of leveraging temporal information in order to study the evolution of entities and breed promising grounds for further research.

[1]  Cristina Ribeiro,et al.  WikiChanges: exposing Wikipedia revision activity , 2008, Int. Sym. Wikis.

[2]  Cong Yu,et al.  Dynamic relationship and event discovery , 2011, WSDM '11.

[3]  Darren Gergle,et al.  Staying in the loop: structure and dynamics of Wikipedia's breaking news collaborations , 2012, WikiSym '12.

[4]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[5]  Ulrik Brandes,et al.  Revision and Co-revision in Wikipedia : Detecting Clusters of Interest , 2007 .

[6]  Andrew Lih,et al.  Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource , 2004 .

[7]  Simone Paolo Ponzetto,et al.  WikiTaxonomy: A Large Scale Knowledge Resource , 2008, ECAI.

[8]  Raphaël Troncy,et al.  LODE: Linking Open Descriptions of Events , 2009, ASWC.

[9]  Michael Gertz,et al.  Temporal Information Retrieval: Challenges and Opportunities , 2011, TWAW.

[10]  Giang Binh Tran,et al.  Indexing and analyzing wikipedia's current events portal, the daily news summaries by the crowd , 2014, WWW '14 Companion.

[11]  Junjie Yao,et al.  EventSearch: a system for event discovery and retrieval on multi-type historical data , 2012, KDD.

[12]  Wolfgang Nejdl,et al.  Extracting Event-Related Information from Article Updates in Wikipedia , 2013, ECIR.

[13]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.

[14]  Michela Ferron,et al.  Collective memory building in Wikipedia: the case of North African uprisings , 2011, Int. Sym. Wikis.

[15]  Gerhard Weikum,et al.  CATE: context-aware timeline for entity illustration , 2011, WWW.

[16]  Insup Lee,et al.  STiki: an anti-vandalism tool for Wikipedia using spatio-temporal analysis of revision metadata , 2010, Int. Sym. Wikis.

[17]  Kjetil Nørvåg,et al.  WikiPop: personalized event detection system based on Wikipedia page view statistics , 2010, CIKM '10.

[18]  Michael Gertz,et al.  Event-centric search and exploration in document collections , 2012, JCDL '12.

[19]  Véronique Malaisé,et al.  Design and use of the Simple Event Model (SEM) , 2011, J. Web Semant..

[20]  Wolfgang Nejdl,et al.  Temporal summarization of event-related updates in wikipedia , 2013, WWW '13 Companion.

[21]  Doug Downey,et al.  Understanding the relationship between searchers' queries and information goals , 2008, CIKM '08.

[22]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[23]  Marco Fisichella,et al.  Towards an Entity-Based Automatic Event Validation , 2014, ECIR.

[24]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[25]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2003, WWW '03.

[26]  Robert P. Biuk-Aghai,et al.  What did they do? Deriving high-level edit histories in Wikis , 2010, Int. Sym. Wikis.

[27]  Gerhard Weikum,et al.  Extraction of temporal facts and events from Wikipedia , 2012, TempWeb '12.

[28]  Oliver Ferschke,et al.  Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia’s Edit History , 2011, ACL.

[29]  Susan T. Dumais,et al.  The web changes everything: understanding the dynamics of web content , 2009, WSDM '09.

[30]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[31]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[32]  Avare Stewart,et al.  Unsupervised public health event detection for epidemic intelligence , 2010, CIKM.

[33]  Wei Chu,et al.  Enhancing personalized search by mining and modeling task behavior , 2013, WWW.

[34]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..