Local context selection for outlier ranking in graphs with multiple numeric node attributes

Outlier ranking aims at the distinction between exceptional outliers and regular objects by measuring deviation of individual objects. In graphs with multiple numeric attributes, not all the attributes are relevant or show dependencies with the graph structure. Considering both graph structure and all given attributes, one cannot measure a clear deviation of objects. This is because the existence of irrelevant attributes clearly hinders the detection of outliers. Thus, one has to select local outlier contexts including only those attributes showing a high contrast between regular and deviating objects. It is an open challenge to detect meaningful local contexts for each node in attributed graphs. In this work, we propose a novel local outlier ranking model for graphs with multiple numeric node attributes. For each object, our technique determines its subgraph and its statistically relevant subset of attributes locally. This context selection enables a high contrast between an outlier and the regular objects. Out of this context, we compute the outlierness score by incorporating both the attribute value deviation and the graph structure. In our evaluation on real and synthetic data, we show that our approach is able to detect contextual outliers that are missed by other outlier models.

[1]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[2]  Christos Faloutsos,et al.  PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs , 2012, SDM.

[3]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[4]  Emmanuel Müller,et al.  Statistical selection of relevant subspace projections for outlier ranking , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[5]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Martin Ester,et al.  Mining Cohesive Patterns from Graphs with Feature Vectors , 2009, SDM.

[7]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[8]  Jiawei Han,et al.  gSkeletonClu: Density-Based Network Clustering via Structure-Connected Tree Division or Agglomeration , 2010, 2010 IEEE International Conference on Data Mining.

[9]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[11]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[12]  Ann R. Cannon Essential Statistics , 2001 .

[13]  Hanghang Tong,et al.  Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection , 2011, SDM.

[14]  Weiru Liu,et al.  Detecting anomalies in graphs with numeric labels , 2011, CIKM '11.

[15]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[17]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[18]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[19]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[20]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[21]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[22]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[23]  Christos Faloutsos,et al.  Detecting Fraudulent Personalities in Networks of Online Auctioneers , 2006, PKDD.

[24]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[25]  Klemens Böhm,et al.  Outlier Ranking via Subspace Analysis in Multiple Views of the Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[26]  Hal S. Stern,et al.  CHAPMAN & HALL/CRC Texts in Statistical Science Series , 2002 .

[27]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[28]  M. Stephens Use of the Kolmogorov-Smirnov, Cramer-Von Mises and Related Statistics without Extensive Tables , 1970 .

[29]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[30]  Weng-Keen Wong,et al.  Category detection using hierarchical mean shift , 2009, KDD.

[31]  Jimeng Sun,et al.  Relevance search and anomaly detection in bipartite graphs , 2005, SKDD.

[32]  Klemens Böhm,et al.  Statistical Selection of Congruent Subspaces for Mining Attributed Graphs , 2013, 2013 IEEE 13th International Conference on Data Mining.

[33]  Thomas Seidl,et al.  Spectral Subspace Clustering for Graphs with Feature Vectors , 2013, 2013 IEEE 13th International Conference on Data Mining.

[34]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[35]  Klemens Böhm,et al.  Ranking outlier nodes in subspaces of attributed graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[36]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[37]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[38]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[39]  Thomas Seidl,et al.  Subspace Clustering Meets Dense Subgraph Mining: A Synthesis of Two Paradigms , 2010, 2010 IEEE International Conference on Data Mining.

[40]  Huan Liu,et al.  Unsupervised feature selection for linked social media data , 2012, KDD.