SedanSpot: Detecting Anomalies in Edge Streams

Given a stream of edges from a time-evolving (un) weighted (un) directed graph, we consider the problem of detecting anomalous edges in near real-time using sublinear memory. We propose SedanSpot, a principled randomized algorithm, which exploits two tell-tale signs of anomalous edges: they tend to (i) occur as bursts of activity and (ii) connect parts of the graph which are sparsely connected. SedanSpot has the following desirable properties: (a) Burst Resistance: It provably downsamples edges from bursty periods of network traffic, (b) Holistic scoring: It takes into account the whole (sampled) graph while scoring the anomalousness of an edge, giving diminishing importance to far-away neighbors, (c) Efficiency: It supports fast updates and scoring and hence can be efficiently maintained over stream; further, it can detect anomalous edges in sublinear space and constant time per edge. Through experiments on real-world data, we demonstrate that SedanSpot is 3x faster and 270% more accurate (in terms of AUC) than the state-of-the-art.

[1]  Jure Leskovec,et al.  Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time , 2017, WWW.

[2]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[3]  Michael D. Vose,et al.  A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[4]  Steve Harenberg,et al.  A Scalable Approach for Outlier Detection in Edge Streams Using Sketch-based Approximations , 2016, SDM.

[5]  Charu C. Aggarwal,et al.  On Anomalous Hotspot Discovery in Graph Streams , 2013, 2013 IEEE 13th International Conference on Data Mining.

[6]  Julie A. McCann,et al.  Random Walk with Restart over Dynamic Graphs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[7]  Danai Koutra,et al.  DeltaCon: Principled Massive-Graph Similarity Function with Attribution , 2016, ACM Trans. Knowl. Discov. Data.

[8]  Paul G. Spirakis,et al.  Weighted random sampling with a reservoir , 2006, Inf. Process. Lett..

[9]  Sudipto Guha,et al.  SpotLight: Detecting Anomalies in Streaming Graphs , 2018, KDD.

[10]  Philip S. Yu,et al.  Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[11]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[12]  Leman Akoglu,et al.  Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs , 2016, KDD.

[13]  Yizhou Sun,et al.  Integrating community matching and outlier detection for mining evolutionary community outliers , 2012, KDD.

[14]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[15]  Jafar Adibi,et al.  The Enron Email Dataset Database Schema and Brief Statistical Report , 2004 .

[16]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[17]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[18]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[19]  Robert K. Cunningham,et al.  Results of the DARPA 1998 Offline Intrusion Detection Evaluation , 1999, Recent Advances in Intrusion Detection.

[20]  Linyuan Lu,et al.  Link prediction based on local random walk , 2010, 1001.2467.

[21]  Minji Yoon,et al.  Fast and Accurate Random Walk with Restart on Dynamic Graphs with Guarantees , 2017, WWW.

[22]  Kumar Sricharan,et al.  Localizing anomalous changes in time-evolving graphs , 2014, SIGMOD Conference.

[23]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[24]  Ramana Rao Kompella,et al.  Network Sampling: From Static to Streaming Graphs , 2012, TKDD.

[25]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[26]  Steve Harenberg,et al.  Anomaly detection in dynamic networks: a survey , 2015 .