Near-Optimal and Practical Algorithms for Graph Scan Statistics

Scan statistics is a popular approach used for detecting “hotspots” and “anomalies” in spatio-temporal and network data. This methodology involves maximizing a score function over all connected subgraphs, which is NP-hard in general. A number of heuristics have been proposed for these problems, but they do not provide any quality guarantees. In this paper, we develop a framework for designing algorithms for optimizing a large class of scan statistics for networks, subject to connectivity constraints. Our algorithms run in time that scales linearly on the size of the graph and depends on a parameter we call the “effective solution size”, while providing rigorous approximation guarantees. In contrast, most prior methods have super-linear running times in terms of graph size. Extensive empirical evidence demonstrates the effectiveness and efficiency of our proposed algorithms in comparison with stateof-the-art methods. Our approach improves on the performance relative to all prior methods, giving up to over 25% increase in the score. Further, our algorithms scale to networks with up to a million nodes, which is 1-2 orders of magnitude larger than all prior applications.

[1]  Alessandro Rinaldo,et al.  Changepoint Detection over Graphs with the Spectral Scan Statistic , 2012, AISTATS.

[2]  Toshiro Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Space-time Scan Statistic for Disease Outbreak Detection and Monitoring , 2022 .

[3]  M. Kulldorff,et al.  Evaluation of Spatial Scan Statistics for Irregularly Shaped Clusters , 2006 .

[4]  Daniel B. Neill,et al.  Fast subset scan for spatial pattern detection , 2012 .

[5]  Mam Riess Jones Color Coding , 1962, Human factors.

[6]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[7]  Akshay Krishnamurthy,et al.  Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic , 2013, NIPS.

[8]  Daniel B. Neill,et al.  Dynamic Pattern Detection with Temporal Consistency and Connectivity Constraints , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Ambuj K. Singh,et al.  Mining Heavy Subgraphs in Time-Evolving Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[10]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[11]  Fabio Vandin,et al.  Finding Mutated Subnetworks Associated with Survival in Cancer , 2016, RECOMB 2016.

[12]  Robert C Elston,et al.  The genetic basis of complex traits: rare variants or "common gene, common disease"? , 2007, Methods in molecular biology.

[13]  Aristides Gionis,et al.  Event detection in activity networks , 2014, KDD.

[14]  Cem Aksoylar,et al.  Connected Subgraph Detection with Mirror Descent on SDPs , 2017, ICML.

[15]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[16]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[17]  Alessandro Rinaldo,et al.  Sparsistency of the Edge Lasso over Graphs , 2012, AISTATS.

[18]  Jiashun Jin,et al.  Higher Criticism for Large-Scale Inference: especially for Rare and Weak effects , 2014, 1410.4743.

[19]  Douglas H. Jones,et al.  Goodness-of-fit test statistics that dominate the Kolmogorov statistics , 1979 .

[20]  Daniel B Neill,et al.  An empirical comparison of spatial scan statistics for outbreak detection , 2009, International journal of health geographics.

[21]  Daniel B. Neill,et al.  Fast generalized subset scan for anomalous pattern detection , 2013, J. Mach. Learn. Res..

[22]  D. Neill,et al.  Scalable Detection of Anomalous Patterns With Connectivity Constraints , 2015 .

[23]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.

[24]  David S. Johnson,et al.  The prize collecting Steiner tree problem: theory and practice , 2000, SODA '00.

[25]  Yi-Kuo Yu,et al.  Accuracy Evaluation of the Unified P-Value from Combining Correlated P-Values , 2014, PloS one.