Detecting Anomalous Subgraphs on Attributed Graphs via Parametric Flow

Detecting anomalies from structured graph data is becoming a critical task for many applications such as an analysis of disease infection in communities. To date, however, there exists no efficient method that works on massive attributed graphs with millions of vertices for detecting anomalous subgraphs with an abnormal distribution of vertex attributes. Here we report that this task is efficiently solved using the recent graph cut-based formulation. In particular, the full hierarchy of anomalous subgraphs can be simultaneously obtained via the parametric flow algorithm, which allows us to introduce the size constraint on anomalous subgraphs. We thoroughly examine the method using various sizes of synthetic and real-world datasets and show that our method is more than five orders of magnitude faster than the state-of-the-art method and is more effective in detection of anomalous subgraphs.

[1]  Rasmus Pagh,et al.  A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data , 2012, KDD.

[2]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[3]  Yoshinobu Kawahara,et al.  Multi-Task Feature Selection on Multiple Networks via Maximum Flows , 2014, SDM.

[4]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007 .

[5]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[6]  Emmanuel Müller,et al.  Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[7]  Yoshinobu Kawahara,et al.  Structured Convex Optimization under Submodular Constraints , 2013, UAI.

[8]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[10]  Hanghang Tong,et al.  Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection , 2011, SDM.

[11]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[12]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[13]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[14]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.

[15]  Karsten M. Borgwardt,et al.  Rapid Distance-Based Outlier Detection via Sampling , 2013, NIPS.

[16]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[17]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[18]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[19]  Kanishka Bhaduri,et al.  Algorithms for speeding up distance-based outlier detection , 2011, KDD.

[20]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[22]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[23]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[24]  Nan Li,et al.  A Probabilistic Approach to Uncovering Attributed Graph Anomalies , 2014, SDM.

[25]  Christos Faloutsos,et al.  Metric forensics: a multi-level approach for mining volatile graphs , 2010, KDD.

[26]  Hong Cheng,et al.  GBAGC: A General Bayesian Framework for Attributed Graph Clustering , 2014, TKDD.

[27]  Klemens Böhm,et al.  Ranking outlier nodes in subspaces of attributed graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[28]  Yoshinobu Kawahara,et al.  Efficient network-guided multi-locus association mapping with graph cuts , 2012, Bioinform..

[29]  Daniel R. Dooly,et al.  Algorithms for the constrained maximum-weight connected graph problem , 1996 .