Tree decomposition based anomalous connected subgraph scanning for detecting and forecasting events in attributed social media networks

Abstract Event detection and forecasting in social media networks, such as disease outbreak and air pollution event detection, have been formulated as an anomalous connected subgraph detection problem. However, the huge search space and the sparsity of anomaly events make it difficult to solve this problem effectively and efficiently. This paper presents a general framework, namely anomalous connected subgraph scanning (GraphScan) which optimizes a large class of sophisticated nonlinear nonparametric scan statistic functions, to solve this problem in attributed social media networks. We first transform the sophisticated nonlinear nonparametric scan statistics functions into the Price-Collecting Steiner Tree (PCST) problem with provable guarantees for evaluating the significance of connected subgraphs to indicate the ongoing or forthcoming events. Then, we use tree decomposition technique to divide the whole graph into a set of smaller subgraph bags, and arrange them into a tree structure, through which we can reduce the search space dramatically. Finally, we propose an efficient approximation algorithm to solve the problem of anomalous subgraph detection using the tree of bags. With two real-world datasets from different domains, we conduct extensive experimental evaluations to demonstrate the effectiveness and efficiency of the proposed approach.

[1]  Xiaofeng Wang,et al.  Automatic Crime Prediction Using Events Extracted from Twitter Posts , 2012, SBP.

[2]  Aristides Gionis,et al.  Event detection in activity networks , 2014, KDD.

[3]  Ping Wang,et al.  A Bayesian Perspective on Early Stage Event Prediction in Longitudinal Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hans-Peter Kriegel,et al.  SigniTrend: scalable detection of emerging topics in textual streams by hashed significance thresholds , 2014, KDD.

[5]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[6]  Naren Ramakrishnan,et al.  SourceSeer: Forecasting Rare Disease Outbreaks Using Multiple Data Sources , 2015, SDM.

[7]  Liang Zhao,et al.  Spatial Event Forecasting in Social Media With Geographically Hierarchical Regularization , 2017, Proceedings of the IEEE.

[8]  Takahiro Hara,et al.  Detecting Local Events by Analyzing Spatiotemporal Locality of Tweets , 2013, 2013 27th International Conference on Advanced Information Networking and Applications Workshops.

[9]  Mohamed A. Sharaf,et al.  Emerging event detection in social networks with location sensitivity , 2014, World Wide Web.

[10]  Ken-ichi Kawarabayashi,et al.  Some Recent Progress and Applications in Graph Minor Theory , 2007, Graphs Comb..

[11]  Curtis B. Storlie,et al.  Scan Statistics for the Online Detection of Locally Anomalous Subgraphs , 2013, Technometrics.

[12]  Ambuj K. Singh,et al.  Mining Evolving Network Processes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[13]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[14]  Mohamed Medhat Gaber,et al.  A rule dynamics approach to event detection in Twitter with its application to sports and politics , 2016, Expert Syst. Appl..

[15]  D. Neill,et al.  Scalable Detection of Anomalous Patterns With Connectivity Constraints , 2015 .

[16]  Andrew W. Moore,et al.  Detection of emerging space-time clusters , 2005, KDD '05.

[17]  Christian S. Jensen,et al.  Efficient Online Summarization of Large-Scale Dynamic Networks , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18]  Daniel B. Neill,et al.  Non-Parametric Scan Statistics for Disease Outbreak Detection on Twitter , 2014, Online Journal of Public Health Informatics.

[19]  Di Wang,et al.  Real-Time Traffic Event Detection From Social Media , 2017, ACM Trans. Internet Techn..

[20]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[21]  Blair D. Sullivan,et al.  INDDGO: Integrated Network Decomposition & Dynamic programming for Graph Optimization , 2012 .

[22]  Jiawei Han,et al.  gIceberg: Towards iceberg analysis in large graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[23]  Douglas H. Jones,et al.  Goodness-of-fit test statistics that dominate the Kolmogorov statistics , 1979 .

[24]  Jan Treur,et al.  An adaptive temporal-causal network model for social networks based on the homophily and more-becomes-more principle , 2019, Neurocomputing.

[25]  Lei Chen,et al.  Event detection over twitter social media streams , 2013, The VLDB Journal.

[26]  K. Wagner,et al.  Graph Minor Theory , 2005 .

[27]  Liang Zhao,et al.  Spatiotemporal Event Forecasting in Social Media , 2015, SDM.

[28]  Blair D. Sullivan,et al.  Tree decompositions and social graphs , 2014, Internet Math..

[29]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[30]  Juan-Zi Li,et al.  What Happens Next? Future Subevent Prediction Using Contextual Hierarchical LSTM , 2017, AAAI.

[31]  Ambuj K. Singh,et al.  Mining Heavy Subgraphs in Time-Evolving Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[32]  Sirisha Velampalli,et al.  Frequent SubGraph Mining Algorithms: Framework, Classification, Analysis, Comparisons , 2018 .

[33]  Maximilian Walther,et al.  Geo-spatial Event Detection in the Twitter Stream , 2013, ECIR.

[34]  Jianxin Li,et al.  An Efficient Framework for Detecting Evolving Anomalous Subgraphs in Dynamic Networks , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[35]  Pascal Frossard,et al.  Multiscale event detection in social media , 2014, Data Mining and Knowledge Discovery.

[36]  Jianxin Li,et al.  Bursty event detection from microblog: a distributed and incremental approach , 2016, Concurr. Comput. Pract. Exp..

[37]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[38]  Kazufumi Watanabe,et al.  Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs , 2011, CIKM '11.

[39]  Chang Zhou,et al.  Toward continuous pattern detection over evolving large graph with snapshot isolation , 2015, The VLDB Journal.

[40]  Daniel B. Neill,et al.  Fast generalized subset scan for anomalous pattern detection , 2013, J. Mach. Learn. Res..

[41]  Hans L. Bodlaender,et al.  Treewidth: Structure and Algorithms , 2007, SIROCCO.

[42]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[43]  Maximilien Danisch,et al.  Finding Heaviest k-Subgraphs and Events in Social Media , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[44]  S. V. Wiel,et al.  Graph Based Statistical Analysis of Network Traffic , 2011 .

[45]  Qiang Qu,et al.  A direct mining approach to efficient constrained graph pattern discovery , 2013, SIGMOD '13.

[46]  Feng Chen,et al.  Near-Optimal and Practical Algorithms for Graph Scan Statistics with Connectivity Constraints , 2019, ACM Trans. Knowl. Discov. Data.

[47]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[48]  Daniela Perrotta,et al.  Forecasting Seasonal Influenza Fusing Digital Indicators and a Mechanistic Disease Model , 2017, WWW.

[49]  Fengcai Qiao,et al.  Predicting Social Unrest Events with Hidden Markov Models Using GDELT , 2017 .

[50]  F. Gavril The intersection graphs of subtrees in tree are exactly the chordal graphs , 1974 .

[51]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[52]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[53]  Xifeng Yan,et al.  Measuring Two-Event Structural Correlations on Graphs , 2012, Proc. VLDB Endow..

[54]  Jieping Ye,et al.  Hierarchical Incomplete Multi-source Feature Learning for Spatiotemporal Event Forecasting , 2016, KDD.

[55]  Naren Ramakrishnan,et al.  Combining heterogeneous data sources for civil unrest forecasting , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[56]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[57]  Timothy Baldwin,et al.  A Support Platform for Event Detection using Social Intelligence , 2012, EACL.

[58]  Qing Zhang,et al.  Assessing and ranking structural correlations in graphs , 2011, SIGMOD '11.

[59]  Hans L. Bodlaender,et al.  A Partial k-Arboretum of Graphs with Bounded Treewidth , 1998, Theor. Comput. Sci..

[60]  Jeff W. Lingwall,et al.  A Nonparametric Scan Statistic for Multivariate Disease Surveillance , 2007 .

[61]  Daniel B. Neill,et al.  Human Rights Event Detection from Heterogeneous Social Media Graphs , 2015, Big Data.

[62]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[63]  H. Burkom Biosurveillance applying scan statistics with multiple, disparate data sources , 2003, Journal of Urban Health.