Improvised Apriori with frequent subgraph tree for extracting frequent subgraphs

Graphs are considered to be one of the best studied data structures in discrete mathematics and computer science. Hence, data mining on graphs has become quite popular in the past few years. The problem of finding frequent itemsets in conventional data mining on transactional databases, thus transformed to the discovery of subgraphs that frequently occur in the graph dataset containing either single graph or multiple graphs. Most of the existing algorithms in the field of frequent subgraph discovery adopts an Apriori approach based on generation of candidate set and test approach. The problem with this approach is the costlier candidate set generation, particularly when there exist more number of large subgraphs. The research goals in frequent subgraph discovery are to evolve (i) mechanisms that can effectively generate candidate subgraphs excluding duplicates and (ii) mechanisms that find best processing techniques that generate only necessary candidate subgraphs in order to discover the useful and desired frequent subgraphs. In this paper, a two phase approach is proposed by integrating Apriori algorithm on graphs to frequent subgraph (FS) tree to discover frequent subgraphs in graph datasets.

[1]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[2]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  Hans-Peter Kriegel,et al.  Pattern Mining in Frequent Dynamic Subgraphs , 2006, Sixth International Conference on Data Mining (ICDM'06).

[5]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[6]  Lawrence B. Holder,et al.  Subdue: compression-based frequent pattern discovery in graph data , 2005 .

[7]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Jyothisha J. Nair,et al.  Extending Full Transitive Closure to Rank Removable Edges in GN Algorithm , 2016 .

[9]  Jyothisha J. Nair,et al.  Towards efficient analysis of massive networks , 2015 .

[10]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[11]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[12]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[13]  Hiroshi Motoda,et al.  Graph-based induction as a unified learning framework , 1994, Applied Intelligence.

[14]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[15]  Joost N. Kok,et al.  The Gaston Tool for Frequent Subgraph Mining , 2005, GraBaTs.

[16]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[17]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[19]  Cheng-Te Li,et al.  Mining Temporal Subgraph Patterns in Heterogeneous Information Networks , 2010, 2010 IEEE Second International Conference on Social Computing.

[20]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[21]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..