NODAR: mining globally distributed substructures from a single labeled graph

Data mining in structured and semi-structured data focuses on frequent data values. However, in graph data mining, the focus is on common specific topologies. Graph mining, although its ubiquity, is a difficult task since it requires subgraph isomorphism which is known to be NP-complete. In order to effectively prune the search space and thereby save computational time, a graph mining algorithm requires that the support measure of a pattern to be no greater than that of its subpatterns. This property of the support measure is referred to in the literature as the down-closure, anti-monotonicity or admissibility. Unfortunately, when mining a single labeled graph, simply counting the occurrences of a graph pattern may not have the down-closure property. For this, most existing approaches mine frequent substructures in a set of labeled graphs (called also the transactional setting) and few efforts have been devoted to mining frequent globally distributed substructures in a single labeled graph. In this paper, we propose a graph mining algorithm, called NODAR(Non-Overlapping embeDding based grAph mineR), for computing common and globally distributed substructures in a single labeled graph. NODAR adopts the Depth-First Search (DFS) strategy and is based on the SMNOES (Size of Maximum Non Overlapping Embedding Set) as support measure. The core idea of NODAR is to automatically extract frequent subpatterns; and thus without frequency computation thanks to the down-closure property of SMNOES. By adopting this strategy in the computation of frequent substructures, NODAR reduces the number of subgraph isomorphism tests needed to compute pattern frequencies. Experimental results on monograph and transactional graph databases; and comparison with well-known probabilistic and exact algorithms; prove the efficacy of NODAR.

[1]  Hui Xiong,et al.  Mining globally distributed frequent subgraphs in a single labeled graph , 2009, Data Knowl. Eng..

[2]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[3]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Thorsten Meinl,et al.  Edgar: the Embedding-baseD GrAph MineR , 2006 .

[5]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Falk Schreiber,et al.  Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks , 2005, Trans. Comp. Sys. Biology.

[7]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[8]  Takashi Washio,et al.  Classifier Construction by Graph-Based Induction for Graph-Structured Data , 2003, PAKDD.

[9]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[10]  Lawrence B. Holder,et al.  Applying the Subdue Substructure Discovery System to the Chemical Toxicity Domain , 1999, FLAIRS Conference.

[11]  Ehud Gudes,et al.  Support measures for graph data* , 2006, Data Mining and Knowledge Discovery.

[12]  Lawrence B. Holder,et al.  Mining Graph Data , 2006 .

[13]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Andrea Omicini,et al.  Transactions on Computational Systems Biology III , 2005, Trans. Computational Systems Biology.

[15]  Tsau Young Lin,et al.  Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA , 2001 .

[16]  Ehud Gudes,et al.  Discovering Frequent Graph Patterns Using Disjoint Paths , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Sudarshan S. Chawathe,et al.  SEuS: Structure Extraction Using Summaries , 2002, Discovery Science.

[18]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.