Statistical Selection of Congruent Subspaces for Mining Attributed Graphs

Current mining algorithms for attributed graphs exploit dependencies between attribute information and edge structure, referred to as homophily. However, techniques fail if this assumption does not hold for the full attribute space. In multivariate spaces, some attributes have high dependency with the graph structure while others do not show any dependency. Hence, it is important to select congruent subspaces (i.e., subsets of the node attributes) showing dependencies with the graph structure. In this work, we propose a method for the statistical selection of such congruent subspaces. More specifically, we define a measure which assesses the degree of congruence between a set of attributes and the entire graph. We use it as the core of a statistical test, which congruent subspaces must pass. To illustrate its applicability to common graph mining tasks and in order to evaluate our selection scheme, we apply it to community outlier detection. Our selection of congruent subspaces enhances outlier detection by measuring outlier ness scores in selected subspaces only. Experiments on attributed graphs show that our approach outperforms traditional full space approaches and gives way to better outlier detection.

[1]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[2]  Klemens Böhm,et al.  Outlier Ranking via Subspace Analysis in Multiple Views of the Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[3]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[4]  Emmanuel Müller,et al.  Statistical selection of relevant subspace projections for outlier ranking , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[5]  Klemens Böhm,et al.  Ranking outlier nodes in subspaces of attributed graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[6]  Weiru Liu,et al.  Detecting anomalies in graphs with numeric labels , 2011, CIKM '11.

[7]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[8]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[9]  Christos Faloutsos,et al.  PICS: Parameter-free Identification of Cohesive Subgroups in Large Attributed Graphs , 2012, SDM.

[10]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[11]  Huan Liu,et al.  Unsupervised feature selection for linked social media data , 2012, KDD.

[12]  Ichigaku Takigawa,et al.  A spectral clustering approach to optimally combining numericalvectors with a modular network , 2007, KDD '07.

[13]  B. Bollobás The evolution of random graphs , 1984 .

[14]  Charu C. Aggarwal,et al.  Outlier ensembles: position paper , 2013, SKDD.

[15]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[16]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[17]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[18]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[19]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[21]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[22]  Thomas Seidl,et al.  Subspace Clustering Meets Dense Subgraph Mining: A Synthesis of Two Paradigms , 2010, 2010 IEEE International Conference on Data Mining.

[23]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[24]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[25]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[26]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[28]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[29]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.