Mining Graph Topological Patterns: Finding Covariations among Vertex Descriptors

We propose to mine the graph topology of a large attributed graph by finding regularities among vertex descriptors. Such descriptors are of two types: 1) the vertex attributes that convey the information of the vertices themselves and 2) some topological properties used to describe the connectivity of the vertices. These descriptors are mostly of numerical or ordinal types and their similarity can be captured by quantifying their covariation. Mining topological patterns relies on frequent pattern mining and graph topology analysis to reveal the links that exist between the relation encoded by the graph and the vertex attributes. We propose three interestingness measures of topological patterns that differ by the pairs of vertices considered while evaluating up and down co-variations between vertex descriptors. An efficient algorithm that combines search and pruning strategies to look for the most relevant topological patterns is presented. Besides a classical empirical study, we report case studies on four real-life networks showing that our approach provides valuable knowledge.

[1]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[2]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[3]  Thomas Seidl,et al.  DB-CSC: A Density-Based Approach for Subspace Clustering in Graphs with Feature Vectors , 2011, ECML/PKDD.

[4]  Martin Hofmann,et al.  Genes upregulated in a metastasizing human colon carcinoma cell line , 2005, International journal of cancer.

[5]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[6]  Kun-Lung Wu,et al.  Towards proximity pattern mining in large graphs , 2010, SIGMOD Conference.

[7]  Céline Robardet,et al.  SQUAT: A web tool to mine human, murine and avian SAGE data , 2008, BMC Bioinformatics.

[8]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Christine A Iacobuzio-Donahue,et al.  Gene expression profiles associated with advanced pancreatic cancer. , 2008, International journal of clinical and experimental pathology.

[10]  Kazuhisa Makino,et al.  New Algorithms for Enumerating All Maximal Cliques , 2004, SWAT.

[11]  Hong Cheng,et al.  Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities , 2011, TKDD.

[12]  Martin Ester,et al.  Mining Cohesive Patterns from Graphs with Feature Vectors , 2009, SDM.

[13]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[14]  Christophe Rigotti,et al.  Finding Collections of k-Clique Percolated Components in Attributed Graphs , 2012, PAKDD.

[15]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[16]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[17]  Nicole Immorlica,et al.  Joint Cluster Analysis of Attribute Data and Relationship Data , 2008 .

[18]  Jian Pei,et al.  Mining frequent cross-graph quasi-cliques , 2009, TKDD.

[19]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[20]  Thomas Seidl,et al.  Subspace Clustering Meets Dense Subgraph Mining: A Synthesis of Two Paradigms , 2010, 2010 IEEE International Conference on Data Mining.

[21]  Johannes Fürnkranz,et al.  Guest Editorial: Global modeling using local patterns , 2010, Data Mining and Knowledge Discovery.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Wen-Chih Peng,et al.  Clustering spatial data with a geographic constraint: exploring local search , 2011, Knowledge and Information Systems.

[24]  Guimei Liu,et al.  Effective Pruning Techniques for Mining Quasi-Cliques , 2008, ECML/PKDD.

[25]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[26]  Jun Sese,et al.  Mining networks with shared items , 2010, CIKM.

[27]  Rong Ge,et al.  Joint cluster analysis of attribute data and relationship data , 2008, SDM.

[28]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[29]  Bruno Miguel Tavares Gonçalves Topology of complex networks , 2004 .

[30]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[31]  Szymon Jaroszewicz,et al.  Mining rank-correlated sets of numerical attributes , 2006, KDD '06.

[32]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[33]  Mohammed J. Zaki,et al.  Structural correlation pattern mining for large graphs , 2010, MLG '10.

[34]  Takeaki Uno,et al.  An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem , 2008, Algorithmica.

[35]  Ana Paula Appel,et al.  HADI: Mining Radii of Large Graphs , 2011, TKDD.

[36]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[37]  ZhouYang,et al.  Clustering Large Attributed Graphs , 2011 .