An Efficient Approach to Event Detection and Forecasting in Dynamic Multivariate Social Media Networks

Anomalous subgraph detection has been successfully applied to event detection in social media. However, the subgraph detection problembecomes challenging when the social media network incorporates abundant attributes, which leads to a multivariate network. The multivariate characteristic makes most existing methods incapable to tackle this problem effectively and efficiently, as it involves joint feature selection and subgraph detection that has not been well addressed in the current literature, especially, in the dynamic multivariate networks in which attributes evolve over time. This paper presents a generic framework, namely dynamic multivariate evolving anomalous subgraphs scanning (DMGraphScan), to addressthis problem in dynamic multivariate social media networks. We generalize traditional nonparametric statistics, and propose a new class of scan statistic functions for measuring the joint significance of evolving subgraphs and subsets of attributes to indicate the ongoing or forthcoming event in dynamic multivariate networks. We reformulate each scan statistic function as a sequence of subproblems with provable guarantees, and then propose an efficient approximation algorithm for tackling each subproblem. This algorithm resorts to the Lagrangian relaxation and a dynamic programming based on tree-shaped priors. As a case study, we conduct extensive experiments to demonstrate the performance of our proposed approach on two real-world applications (flu outbreak detection, haze detection) in different domains.

[1]  Daniel B. Neill,et al.  Human Rights Event Detection from Heterogeneous Social Media Graphs , 2015, Big Data.

[2]  References , 1971 .

[3]  Aristides Gionis,et al.  Event detection in activity networks , 2014, KDD.

[4]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[5]  Pascal Frossard,et al.  Multiscale event detection in social media , 2014, Data Mining and Knowledge Discovery.

[6]  Aristides Gionis,et al.  ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, USA - August 24 - 27, 2014 , 2014 .

[7]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[8]  Daniel B. Neill,et al.  Fast generalized subset scan for anomalous pattern detection , 2013, J. Mach. Learn. Res..

[9]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[10]  Dimitrios Gunopulos,et al.  On The Spatiotemporal Burstiness of Terms , 2012, Proc. VLDB Endow..

[11]  Daniel B. Neill,et al.  Non-Parametric Scan Statistics for Disease Outbreak Detection on Twitter , 2014, Online Journal of Public Health Informatics.

[12]  Jianxin Li,et al.  Efficient Nonparametric Subgraph Detection Using Tree Shaped Priors , 2016, AAAI.

[13]  M. Kulldorff,et al.  Multivariate scan statistics for disease surveillance , 2007, Statistics in medicine.

[14]  Jianxin Li,et al.  Bursty event detection from microblog: a distributed and incremental approach , 2016, Concurr. Comput. Pract. Exp..

[15]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[16]  Douglas H. Jones,et al.  Goodness-of-fit test statistics that dominate the Kolmogorov statistics , 1979 .

[17]  H. Burkom Biosurveillance applying scan statistics with multiple, disparate data sources , 2003, Journal of Urban Health.

[18]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[19]  Aristides Gionis,et al.  Bump Hunting in the Dark: Local Discrepancy Maximization on Graphs , 2015, IEEE Transactions on Knowledge and Data Engineering.