TKG: Efficient Mining of Top-K Frequent Subgraphs

Frequent subgraph mining is a popular data mining task, which consists of finding all subgraphs that appear in at least minsup graphs of a graph database. An important limitation of traditional frequent subgraph mining algorithms is that the minsup parameter is hard to set. If set too high, few patterns are found and useful information may be missed. But if set too low, runtimes can become very long and a huge number of patterns may be found. Finding an appropriate minsup value to find just enough patterns can thus be very time-consuming. This paper addresses this limitation by proposing an efficient algorithm named TKG to find the top-k frequent subgraphs, where the only parameter is k, the number of patterns to be found. The algorithm utilizes a dynamic search procedure to always explore the most promising patterns first. An extensive experimental evaluation shows that TKG has excellent performance and that it provides a valuable alternative to traditional frequent subgraph mining algorithms.

[1]  Young-Koo Lee,et al.  Top-k frequent induced subgraph mining on a sliding window using sampling , 2017, IMCOM.

[2]  Svetha Venkatesh,et al.  Learning graph representation via frequent subgraphs , 2018 .

[3]  Karthik Raman,et al.  Predicting Novel Metabolic Pathways through Subgraph Mining , 2017, bioRxiv.

[4]  Ruixuan Li,et al.  TGP: Mining Top-K Frequent Closed Graph Pattern without Minimum Support , 2010, ADMA.

[5]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[6]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[7]  Antonio Gomariz,et al.  The SPMF Open-Source Data Mining Library Version 2 , 2016, ECML/PKDD.

[8]  Philip S. Yu,et al.  gPrune: A Constraint Pushing Framework for Graph Pattern Mining , 2007, PAKDD.

[9]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[10]  Mohammad Al Hasan,et al.  FS3: A sampling based method for top-k frequent subgraph mining , 2014, BigData.

[11]  Zhi Cheng,et al.  Mining Recurrent Patterns in a Dynamic Attributed Graph , 2017, PAKDD.

[12]  Boris Cule,et al.  Grasping frequent subgraph mining for bioinformatics applications , 2018, BioData Mining.

[13]  Jerry Chun-Wei Lin,et al.  A Survey of High Utility Itemset Mining , 2019, Studies in Big Data.

[14]  Joost N. Kok,et al.  The Gaston Tool for Frequent Subgraph Mining , 2005, GraBaTs.

[15]  Unil Yun,et al.  The Smallest Valid Extension-Based Efficient, Rare Graph Pattern Mining, Considering Length-Decreasing Support Constraints and Symmetry Characteristics of Graphs , 2016, Symmetry.

[16]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[17]  Unil Yun,et al.  A Weight-Based Approach: Frequent Graph Pattern Mining with Length-Decreasing Support Constraints Using Weighted Smallest Valid Extension , 2016 .

[18]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[19]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[20]  Young-Koo Lee,et al.  Top-k frequent induced subgraph mining using sampling , 2016, EDB.

[21]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[22]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[23]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[24]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).