A new algorithm for mining frequent connected subgraphs based on adjacency matrices

Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting duplicate candidates using canonical form (CF) tests. CF tests have high computational complexity, which affects the efficiency of graph miners. In this paper, we introduce novel properties of the canonical adjacency matrices for reducing the number of CF tests in FCSM. Based on these properties, a new algorithm for frequent connected subgraph mining called grCAM is proposed. The experiments on real world datasets show the impact of the proposed properties in FCSM. Besides, the performance of our algorithm is compared against some other reported algorithms.

[1]  Ashwin Srinivasan,et al.  The Predictive Toxicology Evaluation Challenge , 1997, IJCAI.

[2]  Rafal A. Angryk,et al.  GDClust: A Graph-Based Document Clustering Technique , 2007 .

[3]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[4]  Christian Borgelt,et al.  Canonical Forms for Frequent Graph Mining , 2006, GfKl.

[5]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[6]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[8]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[9]  W. Marsden I and J , 2012 .

[10]  José Francisco Martínez Trinidad,et al.  Mining Frequent Connected Subgraphs Reducing the Number of Candidates , 2008, ECML/PKDD.

[11]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[12]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[14]  Joost N. Kok,et al.  Frequent subgraph miners: runtimes don't say everything , 2006 .

[15]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[16]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[17]  Gesellschaft für Klassifikation. Jahrestagung,et al.  Advances in Data Analysis, Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, March 8-10, 2006 , 2007, GfKl.

[18]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.

[19]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[20]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.