Change point detection for burst analysis from an observed information diffusion sequence of tweets

We propose a method of detecting the period in which a burst of information diffusion took place from an observed diffusion sequence data over a social network and report the results obtained by applying it to the real Twitter data. We assume a generic information diffusion model in which time delay associated with the diffusion follows the exponential distribution and the burst is directly reflected to the changes in the time delay parameter of the distribution. The shape of the parameter’s change is approximated by a step function and the problem of detecting the change points and finding the values of the parameter is formulated as an optimization problem of maximizing the likelihood of generating the observed diffusion sequence. Time complexity of the search is almost proportional to the number of observed data points and has been shown to be very efficient. We first demonstrated that the proposed method can detect the burst using a synthetic data and showed that it performs better than one of the representative state-of-the-art methods, confirming that the proposed method covers a wider range of change patterns. Then, we extended our evaluation on synthetic data to show that it is efficient and effective comparing it with a naive exhaustive search and a simple greedy method. We then apply the method to the real Twitter data of the 2011 To-hoku earthquake and tsunami, and reconfirmed its efficiency and effectiveness. Two interesting discoveries are that a burst period detected by the proposed method tends to contain massive homogeneous tweets on a specific topic even if the observed diffusion sequence consists of heterogeneous tweets on various topics, and that assuming the information diffusion path to be a line shape tree can give a good approximation of the maximum likelihood estimator when the actual diffusion path is not known.

[1]  Juan Julián Merelo Guervós,et al.  Genetic Algorithm for Burst Detection and Activity Tracking in Event Streams , 2006, PPSN.

[2]  Kenji Nakamura,et al.  A Real-Time Burst Detection Method , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[3]  Duncan J Watts,et al.  A simple model of global cascades on random networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Masahiro Kimura,et al.  Learning Continuous-Time Information Diffusion Model for Social Behavioral Data Analysis , 2009, ACML.

[5]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[6]  Hsinchun Chen,et al.  Burst Detection From Multiple Data Streams: A Network-Based Approach , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Masahiro Kimura,et al.  Selecting Information Diffusion Models over Social Networks for Behavioral Analysis , 2010, ECML/PKDD.

[8]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[9]  D. Watts,et al.  Influentials, Networks, and Public Opinion Formation , 2007 .

[10]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[11]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[12]  H. Akaike A new look at the statistical model identification , 1974 .

[13]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[14]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[15]  Masahiro Kimura,et al.  Extracting influential nodes on a social network for information diffusion , 2009, Data Mining and Knowledge Discovery.

[16]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[17]  John Scott What is social network analysis , 2010 .

[18]  Jure Leskovec,et al.  Correcting for missing data in information cascades , 2011, WSDM '11.

[19]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[20]  Xin Zhang,et al.  Fast Algorithms for Burst Detection , 2006 .