The birth of collective memories: Analyzing emerging entities in text streams

We study how collective memories are formed online. We do so by tracking entities that emerge in public discourse, that is, in online text streams such as social media and news streams, before they are incorporated into Wikipedia, which, we argue, can be viewed as an online place for collective memory. By tracking how entities emerge in public discourse, that is, the temporal patterns between their first mention in online text streams and subsequent incorporation into collective memory, we gain insights into how the collective remembrance process happens online. Specifically, we analyze nearly 80,000 entities as they emerge in online text streams before they are incorporated into Wikipedia. The online text streams we use for our analysis comprise of social media and news streams, and span over 579 million documents in a time span of 18 months. We discover two main emergence patterns: entities that emerge in a “bursty” fashion, that is, that appear in public discourse without a precedent, blast into activity and transition into collective memory. Other entities display a “delayed” pattern, where they appear in public discourse, experience a period of inactivity, and then resurface before transitioning into our cultural collective memory.

[1]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[2]  Achim Rettinger,et al.  On Emerging Entity Detection , 2016, EKAW.

[3]  William Hirst,et al.  Towards a psychology of collective memory , 2008, Memory.

[4]  Feng Niu,et al.  Building an Entity-Centric Stream Filtering Test Collection for TREC 2012 , 2012, TREC.

[5]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[6]  Gerhard Weikum,et al.  Discovering emerging entities with ambiguous names , 2014, WWW.

[7]  Sharon Meraz Using Time Series Analysis to Measure Intermedia Agenda-Setting Influence in Traditional Media and Political Blog Networks , 2011 .

[8]  C. Pentzold Fixing the floating gap: The online encyclopaedia Wikipedia as a global memory place , 2009 .

[9]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[10]  Y. Stern,et al.  Age-Related Changes in Task Related Functional Network Connectivity , 2012, PloS one.

[11]  Robyn Fivush,et al.  The development of collective remembering , 2008, Memory.

[12]  M. Johnson,et al.  Circulating microRNAs in Sera Correlate with Soluble Biomarkers of Immune Activation but Do Not Predict Mortality in ART Treated Individuals with HIV-1 Infection: A Case Control Study , 2015, PloS one.

[13]  Daniel Levy,et al.  The Collective Memory Reader , 2011 .

[14]  Maarten de Rijke,et al.  Query-Dependent Contextualization of Streaming Data , 2014, ECIR.

[15]  Adam Jatowt,et al.  Studying how the past is remembered: towards computational history through large scale text mining , 2011, CIKM '11.

[16]  Isabelle Guyon,et al.  Clustering: Science or Art? , 2009, ICML Unsupervised and Transfer Learning.

[17]  Michela Ferron,et al.  Collective memory building in Wikipedia: the case of North African uprisings , 2011, Int. Sym. Wikis.

[18]  Ellen M. Voorhees,et al.  Evaluating Stream Filtering for Entity Profile Updates in TREC 2012, 2013, and 2014 , 2014, TREC.

[19]  Catalina Danis,et al.  Organization of Public Events in Long-Term Memory , 1990 .

[20]  T. Tierney The Public Space of Social Media: Connected Cultures of the Network Society , 2013 .

[21]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[22]  Geeta Sikka,et al.  Recent Techniques of Clustering of Time Series Data: A Survey , 2012 .

[23]  Maarten de Rijke,et al.  Dynamic Collective Entity Representations for Entity Ranking , 2016, WSDM '16.

[24]  Sharon Meraz,et al.  Is There an Elite Hold? Traditional Media to Social Media Agenda Setting Influence in Blog Networks , 2009, J. Comput. Mediat. Commun..

[25]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[26]  Jan W Kantelhardt,et al.  The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks , 2015, PloS one.

[27]  Didier Guyvarc'h,et al.  La mémoire collective , 2006 .

[28]  Brendan Luyt,et al.  Wikipedia, collective memory, and the Vietnam war , 2016, J. Assoc. Inf. Sci. Technol..

[29]  Ineke Wessel,et al.  Collective memory: A perspective from (experimental) clinical psychology , 2008, Memory.

[30]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[31]  J. Assmann,et al.  Collective Memory and Cultural Identity , 1995 .

[32]  Ricardo Campos,et al.  What is the Temporal Value of Web Snippets? , 2011, TWAW.

[33]  Ed H. Chi,et al.  The singularity is not near: slowing growth of Wikipedia , 2009, Int. Sym. Wikis.

[34]  A. Erll,et al.  Cultural memory studies , 2008 .

[35]  Adam Jatowt,et al.  Mapping Temporal Horizons: Analysis of Collective Future and Past related Attention in Twitter , 2015, WWW.

[36]  M. de Rijke,et al.  Document Filtering for Long-tail Entities , 2016, CIKM.

[37]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[38]  Oren Etzioni,et al.  No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities , 2012, EMNLP.

[39]  M. Halbwachs Les cadres sociaux de la mémoire , 1994 .

[40]  Gerhard Weikum,et al.  Fine-grained Semantic Typing of Emerging Entities , 2013, ACL.

[41]  M. de Rijke,et al.  Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams , 2014, ECIR.

[42]  George Lipsitz,et al.  Time Passages: Collective Memory and American Popular Culture , 1990 .

[43]  Laura Ann Granka Media agenda setting and online search traffic: Influences of online and traditional media , 2010 .

[44]  Shashi Shekhar,et al.  Automatic Information Extraction , 2008, Encyclopedia of GIS.

[45]  András Kornai,et al.  Dynamics of Conflicts in Wikipedia , 2012, PloS one.

[46]  Stefan Van Aelst,et al.  Fast and robust bootstrap for multivariate inference: The R package FRB , 2013 .

[47]  Michela Ferron Collective Memories in Wikipedia , 2012 .

[48]  Taha Yasseri,et al.  Circadian Patterns of Wikipedia Editorial Activity: A Demographic Analysis , 2011, PloS one.

[49]  Arthur G. Neal,et al.  National Trauma and Collective Memory: Major Events in the American Century , 1998 .

[50]  Laura A. Granka,et al.  Measuring Agenda Setting with Online Search Traffic: Influences of Online and Traditional Media , 2010 .

[51]  Craig MacDonald,et al.  Can Twitter Replace Newswire for Breaking News? , 2013, ICWSM.

[52]  Gaye Tuchman Objectivity as Strategic Ritual: An Examination of Newsmen's Notions of Objectivity , 1972, American Journal of Sociology.

[53]  Anders Mollgaard,et al.  The memory remains: Understanding collective memory in the digital age , 2016, Science Advances.

[54]  Kwan-Liu Ma,et al.  Breaking news on twitter , 2012, CHI.

[55]  Danny P. Wallace,et al.  FROM THE EDITORS: The Democratization of Information? Wikipedia as a Reference Resource , 2005 .

[56]  Ansgar Nünning,et al.  Cultural Memory Studies: An International and Interdisciplinary Handbook , 2008 .