Scalable Anomaly Ranking of Attributed Neighborhoods

Given a graph with node attributes, what neighborhoods are anomalous? To answer this question, one needs a quality score that utilizes both structure and attributes. Popular existing measures either quantify the structure only and ignore the attributes (e.g., conductance), or only consider the connectedness of the nodes inside the neighborhood and ignore the cross-edges at the boundary (e.g., density). In this work we propose normality, a new quality measure for attributed neighborhoods. Normality utilizes structure and attributes together to quantify both internal consistency and external separability. It exhibits two key advantages over other measures: (1) It allows many boundary-edges as long as they can be "exonerated"; i.e., either (i) are expected under a null model, and/or (ii) the boundary nodes do not exhibit the subset of attributes shared by the neighborhood members. Existing measures, in contrast, penalize boundary edges irrespectively. (2) Normality can be efficiently maximized to automatically infer the shared attribute subspace (and respective weights) that characterize a neighborhood. This efficient optimization allows us to process graphs with millions of attributes. We capitalize on our measure to present a novel approach for Anomaly Mining of Entity Neighborhoods (AMEN). Experiments on real-world attributed graphs illustrate the effectiveness of our measure at anomaly detection, outperforming popular approaches including conductance, density, OddBall, and SODA. In addition to anomaly detection, our qualitative analysis demonstrates the utility of normality as a powerful tool to contrast the correlation between structure and attributes across different graphs.

[1]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[3]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[4]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[5]  Jure Leskovec,et al.  Discovering social circles in ego networks , 2012, ACM Trans. Knowl. Discov. Data.

[6]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[8]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[10]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[11]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[12]  Jiawei Han,et al.  gIceberg: Towards iceberg analysis in large graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[13]  Emmanuel Müller,et al.  Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[14]  Jiawei Han,et al.  Local Learning for Mining Outlier Subgraphs from Network Datasets , 2014, SDM.

[15]  J. Moody Race, School Integration, and Friendship Segregation in America1 , 2001, American Journal of Sociology.

[16]  Christos Faloutsos,et al.  Weighted graphs and disconnected components: patterns and a generator , 2008, KDD.

[17]  David F. Gleich,et al.  Vertex neighborhoods, low conductance cuts, and good seeds for local community methods , 2012, KDD.

[18]  Nan Li,et al.  A Probabilistic Approach to Uncovering Attributed Graph Anomalies , 2014, SDM.

[19]  Aristides Gionis,et al.  Overlapping community detection in labeled graphs , 2014, Data Mining and Knowledge Discovery.

[20]  M. Newman,et al.  Mixing Patterns and Community Structure in Networks , 2002, cond-mat/0210146.

[21]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[22]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[23]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[24]  N. Denton,et al.  The Dimensions of Residential Segregation , 1988 .

[25]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[26]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[27]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[28]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[29]  Marco Conti,et al.  Analysis of Ego Network Structure in Online Social Networks , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[30]  Thomas Seidl,et al.  Spectral Subspace Clustering for Graphs with Feature Vectors , 2013, 2013 IEEE 13th International Conference on Data Mining.

[31]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[32]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[33]  Christos Faloutsos,et al.  PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs , 2012, SDM.

[34]  Yasuhiro Fujiwara,et al.  Fast Algorithm for Modularity-Based Graph Clustering , 2013, AAAI.

[35]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.