A Probabilistic Approach to Uncovering Attributed Graph Anomalies

Uncovering subgraphs with an abnormal distribution of attributes reveals much insight into network behaviors. For example in social or communication networks, diseases or intrusions usually do not propagate uniformly, which makes it critical to find anomalous regions with high concentrations of a specific disease or intrusion. In this paper, we introduce a probabilistic model to identify anomalous subgraphs containing a significantly different percentage of a certain vertex attribute, such as a specific disease or an intrusion, compared to the rest of the graph. Our framework, gAnomaly, models generative processes of vertex attributes and divides the graph into regions that are governed by background and anomaly processes. Two types of regularizers are employed to smoothen the regions and to facilitate vertex assignment. We utilize deterministic annealing EM to learn the model parameters, which is less initialization-dependent and better at avoiding local optima. In order to find fine-grained anomalies, an iterative procedure is further proposed. Experiments show gAnomaly outperforms a state-of-the-art algorithm at uncovering anomalous subgraphs in attributed graphs.

[1]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[2]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[3]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[4]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[5]  Jure Leskovec,et al.  Latent Multi-group Membership Graph Model , 2012, ICML.

[6]  Martin Ester,et al.  Mining Cohesive Patterns from Graphs with Feature Vectors , 2009, SDM.

[7]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[8]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[9]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[10]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[11]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[12]  Joachim M. Buhmann,et al.  Multi-assignment clustering for Boolean data , 2009, ICML '09.

[13]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[14]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[15]  Daniel Gildea,et al.  Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients , 2012, ICML.

[16]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[17]  Weiru Liu,et al.  Detecting anomalies in graphs with numeric labels , 2011, CIKM '11.

[18]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[19]  Kiyoko F. Aoki-Kinoshita,et al.  A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology , 2008, TKDD.

[20]  Haim H. Permuter,et al.  A study of Gaussian mixture models of color and texture features for image classification and segmentation , 2006, Pattern Recognit..

[21]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[22]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[23]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[24]  Heiga Zen,et al.  Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition , 2005, IEICE Trans. Inf. Syst..

[25]  Kenichi Kurihara,et al.  Graph Mining with Variational Dirichlet Process Mixture Models , 2008, SDM.

[26]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[27]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[28]  Jiawei Han,et al.  gIceberg: Towards iceberg analysis in large graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[29]  Zhen Wen,et al.  Density index and proximity search in large graphs , 2012, CIKM '12.

[30]  Pang-Ning Tan,et al.  Outrank: a Graph-Based Outlier Detection Framework Using Random Walk , 2008, Int. J. Artif. Intell. Tools.

[31]  Yandong Liu,et al.  A Generalized Fast Subset Sums Framework for Bayesian Event Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[32]  Luo Si,et al.  Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis , 2005, PAKDD.

[33]  Taku Kudo,et al.  Clustering graphs by weighted substructure mining , 2006, ICML.