An improved unique canonical labeling for frequent subgraph mining

Frequent subgraph mining is a fundamental task and widely explored in many research application domains such as computational biology, social network analysis, chemical structure analysis and web mining. The problem of frequent subgraph mining is a challenge as the number of possible subgraphs and verifying the isomorphism of the subgraphs is exponential problem. Canonical labeling is a standard approach to handle graph (subgraph) isomorphism that has high complexity and is NP-complete. In this paper we propose a systematic approach and formulate an algorithm to construct canonical label for a graph (subgraph) that uniquely identifies a graph based on the special invariant properties of graphs. Our experimental evaluation shows that this algorithm effectively addresses canonical labeling, isomorphism of graphs and reduces the computational cost.

[1]  George Karypis,et al.  Automated Approaches for Classifying Structures , 2002, BIOKDD.

[2]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[4]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[5]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[6]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..