Incremental cluster evolution tracking from highly dynamic network data

Dynamic networks are commonly found in the current web age. In scenarios like social networks and social media, dynamic networks are noisy, are of large-scale and evolve quickly. In this paper, we focus on the cluster evolution tracking problem on highly dynamic networks, with clear application to event evolution tracking. There are several previous works on data stream clustering using a node-by-node approach for maintaining clusters. However, handling of bulk updates, i.e., a subgraph at a time, is critical for achieving acceptable performance over very large highly dynamic networks. We propose a subgraph-by-subgraph incremental tracking framework for cluster evolution in this paper. To effectively illustrate the techniques in our framework, we consider the event evolution tracking task in social streams as an application, where a social stream and an event are modeled as a dynamic post network and a dynamic cluster respectively. By monitoring through a fading time window, we introduce a skeletal graph to summarize the information in the dynamic network, and formalize cluster evolution patterns using a group of primitive evolution operations and their algebra. Two incremental computation algorithms are developed to maintain clusters and track evolution patterns as time rolls on and the network evolves. Our detailed experimental evaluation on large Twitter datasets demonstrates that our framework can effectively track the complete set of cluster evolution patterns from highly dynamic networks on the fly.

[1]  Philip S. Yu,et al.  Density-based clustering of data streams at multiple resolutions , 2009, TKDD.

[2]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[3]  Srinivasan Parthasarathy,et al.  An event-based framework for characterizing the evolutionary behavior of interaction graphs , 2007, KDD '07.

[4]  Cong Yu,et al.  Dynamic relationship and event discovery , 2011, WSDM '11.

[5]  Krithi Ramamritham,et al.  Real Time Discovery of Dense Clusters in Highly Dynamic Graphs: Identifying Real World Events in Highly Dynamic Environments , 2012, Proc. VLDB Endow..

[6]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[7]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[8]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[9]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[10]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[11]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[12]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[13]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[14]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[15]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[16]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[17]  KoudasNick,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, VLDB 2012.

[18]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[19]  Yiming Yang,et al.  Improving text categorization methods for event tracking , 2000, SIGIR '00.

[20]  Hila Becker,et al.  Learning similarity metrics for event identification in social media , 2010, WSDM '10.

[21]  Martin Halvey,et al.  An assessment of tag presentation techniques , 2007, WWW '07.

[22]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[23]  Xin Jin,et al.  Topic initiator detection on the world wide web , 2010, WWW '10.

[24]  Ben Y. Zhao,et al.  Multi-scale dynamics in a massive online social network , 2012, Internet Measurement Conference.

[25]  Laks V. S. Lakshmanan,et al.  Event Evolution Tracking from Streaming Social Posts , 2013, ArXiv.

[26]  Hector Garcia-Molina,et al.  Overview of multidatabase transaction management , 2005, The VLDB Journal.

[27]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[28]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[29]  Jiawei Han,et al.  A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks , 2009, Proc. VLDB Endow..