Mining rich graphs: a graph transformation approach

There have been a large number of studies mining graph patterns, however, most of them only process graphs with nodes and edges having a single label. Often there is a need to represent a rich dataset as graphs with nodes and edges having multiple labels each label representing a feature in the dataset. For example, in a social network, multiple features could be associated to individuals (e.g., age, address, etc.) and their relationships (e.g., friend, foe, send message, etc.). To analyze these rich datasets, there is a need to extend existing graph mining algorithms to also mine rich graphs with nodes and edges having multiple labels. In this paper, we propose a novel algorithm and framework to transform richly labeled graphs (i.e., graphs with nodes and edges having multiple labels) to an equivalent set of simple labeled graphs (i.e., graphs with nodes and edges having single labels). The resultant simple graphs could be fed to most of the available graph mining algorithms to produce simple graph patterns. A reverse translation process is then employed to recover rich graph patterns from the simple graph patterns. We demonstrate that our proposed algorithm is scalable on various synthetic and real datasets. We experiment with three notable graph mining algorithms which are gSpan, CloseGraph, and Top-k LEAP algorithms. We show that our algorithm and framework could complement existing simple graph mining algorithms to allow them to mine rich graphs.

[1]  Sudarshan S. Chawathe,et al.  SEuS: Structure Extraction Using Summaries , 2002, Discovery Science.

[2]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[3]  Wei Wang,et al.  LTS: Discriminative subgraph mining by learning from search history , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[6]  Xuemin Lin,et al.  NOVA: A Novel and Efficient Framework for Finding Subgraph Isomorphism Mappings in Large Graphs , 2010, DASFAA.

[7]  Hong Cheng,et al.  Identifying bug signatures using discriminative graph mining , 2009, ISSTA.

[8]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[9]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[10]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[13]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[15]  Jun Zhang,et al.  FOGGER: an algorithm for graph generator discovery , 2009, EDBT '09.

[16]  Philip S. Yu,et al.  Mining top-K large structural patterns in a massive network , 2011, Proc. VLDB Endow..

[17]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[18]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[19]  Yoon-Joon Lee,et al.  An Edge-Based Framework for Fast Subgraph Matching in a Large Graph , 2011, DASFAA.

[20]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[21]  Wilfred Ng,et al.  Correlation search in graph databases , 2007, KDD '07.

[22]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.