A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston

Several new miners for frequent subgraphs have been published recently. Whereas new approaches are presented in detail, the quantitative evaluations are often of limited value: only the performance on a small set of graph databases is discussed and the new algorithm is often only compared to a single competitor based on an executable. It remains unclear, how the algorithms work on bigger/other graph databases and which of their distinctive features is best suited for which database. We have re-implemented the subgraph miners MoFa, gSpan, FFSM, and Gaston within a common code base and with the same level of programming expertise and optimization effort. This paper presents the results of a comparative benchmarking that ran the algorithms on a comprehensive set of graph databases.

[1]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[2]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[3]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Christian Borgelt,et al.  Large scale mining of molecular fragments with wildcards , 2004, Intell. Data Anal..

[5]  Paul R. Cohen,et al.  Very Predictive Ngrams for Space-Limited Probabilistic Models , 2003, IDA.

[6]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[9]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[10]  HoferHeiko,et al.  Large scale mining of molecular fragments with wildcards , 2004 .

[11]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[12]  Ashwin Srinivasan,et al.  The Predictive Toxicology Evaluation Challenge , 1997, IJCAI.

[13]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[15]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[16]  Christian Borgelt,et al.  Discriminative Closed Fragment Mining and Perfect Extensions in MoFa , 2004 .

[17]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[18]  John Wang,et al.  Encyclopedia of Data Warehousing and Mining , 2005 .

[19]  Christian Borgelt,et al.  Mining Fragments with Fuzzy Chains in Molecular Databases , 2004 .

[20]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[21]  Joost N. Kok,et al.  Frequent graph mining and its application to molecular databases , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).