Full duplicate candidate pruning for frequent connected subgraph mining

Support calculation and duplicate detection are the most challenging and unavoidable subtasks in frequent connected subgraph (FCS) mining. The most successful FCS mining algorithms have focused on optimizing these subtasks since the existing solutions for both subtasks have high computational complexity. In this paper, we propose two novel properties that allow removing all duplicate candidates before support calculation. Besides, we introduce a fast support calculation strategy based on embedding structures. Both properties and the new embedding structure are used for designing two new algorithms: gdFil for mining all FCSs; and gdClosed for mining all closed FCSs. The experimental results show that our proposed algorithms get the best performance in comparison with other well known algorithms.

[1]  Christian Borgelt,et al.  Canonical Forms for Frequent Graph Mining , 2006, GfKl.

[2]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[3]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  José Francisco Martínez Trinidad,et al.  Mining Frequent Connected Subgraphs Reducing the Number of Candidates , 2008, ECML/PKDD.

[5]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[6]  W. Marsden I and J , 2012 .

[7]  Richi Nayak,et al.  A knowledge retrieval model using ontology mining and user profiling , 2008, Integr. Comput. Aided Eng..

[8]  Ashwin Srinivasan,et al.  The Predictive Toxicology Evaluation Challenge , 1997, IJCAI.

[9]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[10]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[11]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[12]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  José Francisco Martínez Trinidad,et al.  Duplicate Candidate Elimination and Fast Support Calculation for Frequent Subgraph Mining , 2009, IDEAL.

[14]  Maria Soledad Pera,et al.  Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles , 2008, Integr. Comput. Aided Eng..

[15]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[16]  M. Shahriar Hossain,et al.  GDClust: A Graph-Based Document Clustering Technique , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[17]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[18]  Philippe Besnard,et al.  Ontology-based inference for causal explanation , 2008, Integr. Comput. Aided Eng..

[19]  Sriram Raghavan,et al.  Representing Web graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Christian Borgelt,et al.  Advanced pruning strategies to speed up mining closed molecular fragments , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[21]  Georges-Köhler-Allee Albert-Ludwidgs-Universität,et al.  Frequent Subgraph Miners : Runtimes Don ’ t Say Everything , 2006 .

[22]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.

[23]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  Runian Geng,et al.  Efficient mining of interesting weighted patterns from directed graph traversals , 2009, Integr. Comput. Aided Eng..

[27]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[28]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[29]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.