Wikipedia graph mining: dynamic structure of collective memory

Wikipedia is the biggest encyclopedia ever created and the fifth most visited website in the world. Tens of millions of people surf it every day, seeking answers to various questions. Collective user activity on its pages leaves publicly available footprints of human behavior, making Wikipedia an excellent source for analysis of collective behavior. In this work, we propose a distributed graph-based event extraction model, inspired by the Hebbian learning theory. The model exploits collective effect of the dynamics to discover events. We focus on data-streams with underlying graph structure and perform several large-scale experiments on the Wikipedia visitor activity data. We show that the presented model is scalable regarding time-series length and graph density, providing a distributed implementation of the proposed algorithm. We extract dynamical patterns of collective activity and demonstrate that they correspond to meaningful clusters of associated events, reflected in the Wikipedia articles. We also illustrate evolutionary dynamics of the graphs over time to highlight changing nature of visitors' interests. Finally, we discuss clusters of events that model collective recall process and represent collective memories - common memories shared by a group of people.

[1]  James R. Foulds,et al.  HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades , 2015, ICML.

[2]  Wendy Hall,et al.  Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach , 2016, WWW.

[3]  Jianxin Li,et al.  An Efficient Approach to Event Detection and Forecasting in Dynamic Multivariate Social Media Networks , 2017, WWW.

[4]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[5]  Albert-László Barabási,et al.  Untangling performance from success , 2015, EPJ Data Science.

[6]  Kirell Maël Benzi From recommender systems to spatio-temporal dynamics with network science , 2017 .

[7]  M. de Rijke,et al.  The birth of collective memories: Analyzing emerging entities in text streams , 2017, J. Assoc. Inf. Sci. Technol..

[8]  J. Assmann,et al.  Collective Memory and Cultural Identity , 1995 .

[9]  Michela Ferron,et al.  Psychological processes underlying Wikipedia representations of natural and manmade disasters , 2012, WikiSym '12.

[10]  Isabel Valera,et al.  Modeling the Dynamics of Learning Activity on the Web , 2017, WWW.

[11]  C. Pentzold Fixing the floating gap: The online encyclopaedia Wikipedia as a global memory place , 2009 .

[12]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[13]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[14]  Claudia Niederée,et al.  What triggers human remembering of events? A large-scale analysis of catalysts for collective memory in Wikipedia , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[15]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Adam Jatowt,et al.  Studying how the past is remembered: towards computational history through large scale text mining , 2011, CIKM '11.

[18]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[19]  Le Song,et al.  Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams , 2015, KDD.

[20]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[21]  Pascal Frossard,et al.  Learning time varying graphs , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Brendan T. O'Connor,et al.  Learning to Extract Events from Knowledge Base Revisions , 2017, WWW.

[23]  A. Stone,et al.  The science of self-report. Implications for research and practice , 1999 .

[24]  P. Massa,et al.  The Arab Spring| WikiRevolutions: Wikipedia as a Lens for Studying the Real-Time Formation of Collective Memories of Revolutions , 2011 .

[25]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[26]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Michela Ferron Collective Memories in Wikipedia , 2012 .

[28]  Jeffrey Andrew Barash Collective Memory and the Historical Past , 2016 .

[29]  M. Halbwachs Les cadres sociaux de la mémoire , 1994 .

[30]  Nigel Shadbolt,et al.  From coincidence to purposeful flow? Properties of transcendental information cascades , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[31]  William Hirst,et al.  Collective Memory from a Psychological Perspective , 2009 .

[32]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[33]  Jure Leskovec,et al.  Improving Website Hyperlink Structure Using Server Logs , 2015, WSDM.

[34]  Anders Mollgaard,et al.  The memory remains: Understanding collective memory in the digital age , 2016, Science Advances.