SyGMA: Reducing Symmetry in Graph Mining

While recent algorithms for mining the frequent subgraphs of a database are efficient in the general case, these algorithms tend to do poorly on databases that have a few or no labels. Although little attention has been given to such datasets, there are many existing applications which deal with this type of data. In this paper, we present a novel algorithm, called SyGMA, that improves frequent subgraph mining in such cases by limiting the impact of symmetry on calculations, without the use of memory-expensive structures. Through experimentation on various datasets, we show that our algorithm outperforms, in many cases, one of the leading algorithms for this task.

[1]  Joost N. Kok,et al.  The Gaston Tool for Frequent Subgraph Mining , 2005, GraBaTs.

[2]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[3]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[5]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[6]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[7]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[8]  Ashwin Srinivasan,et al.  The Predictive Toxicology Evaluation Challenge , 1997, IJCAI.

[9]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.