Reducing the Number of Canonical Form Tests for Frequent Subgraph Mining

Frequent connected subgraph (FCS) mining is an interesting problem with wide applications in real life. Most of the FCS mining algorithms have been focused on detecting duplicate candidates using canonical form tests. Canonical form tests have high computational complexity, and therefore, they affect the efficiency of graph miners. In this paper, we introduce novel properties to reduce the number of canonical form tests in FCS mining. Based on these properties, a new algorithm for FCS mining called gRed is presented. The experimentation on real world datasets shows the impact of the proposed properties on the efficiency of gRed reducing the number of canonical form tests regarding gSpan. Besides, the performance of our algorithm is compared against gSpan and other state-of-the-art algorithms.

[1]  W. Rudin Principles of mathematical analysis , 1964 .

[2]  Christian Borgelt,et al.  Canonical Forms for Frequent Graph Mining , 2006, GfKl.

[3]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[4]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[5]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[6]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.

[7]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[8]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Ashwin Srinivasan,et al.  The Predictive Toxicology Evaluation Challenge , 1997, IJCAI.

[10]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[11]  Joost N. Kok,et al.  Frequent subgraph miners: runtimes don't say everything , 2006 .

[12]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[14]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[15]  Thorsten Meinl,et al.  Edgar: the Embedding-baseD GrAph MineR , 2006 .

[16]  Rafal A. Angryk,et al.  GDClust: A Graph-Based Document Clustering Technique , 2007 .

[17]  Ling Chen,et al.  Mining Frequent Subgraph by Incidence Matrix Normalization , 2008, J. Comput..

[18]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[19]  Yang Yu,et al.  FSP: Frequent Substructure Pattern mining , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[20]  Jesús Israel Hernández Hernández Reactive scheduling of DAG applications on heterogeneous and dynamic distributed computing systems , 2009 .

[21]  José Francisco Martínez Trinidad,et al.  Mining Frequent Connected Subgraphs Reducing the Number of Candidates , 2008, ECML/PKDD.

[22]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[23]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.