A comparative survey of algorithms for frequent subgraph discovery

Graph mining is a well-explored area of research where frequent subgraph discovery is an important problem. To get an understanding of various frequent subgraph discovery algorithms and to assess their suitability to various application scenarios, it is important to establish a common framework for their study. The present article addresses this need by bringing out a classification scheme emphasizing the intrinsic characteristics of these algorithms. The classification scheme is based on the search strategy, the nature of the input, and the completeness of the output of these algorithms. A short discussion on a few more recent algorithms is also included. An experimental evaluation that explores the relevance and applicability of a subset of these algorithms for some current application scenarios is furnished for completeness.

[1]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[2]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[3]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[6]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Joost N. Kok,et al.  Faster Association Rules for Multiple Relations , 2001, IJCAI.

[8]  Krishna Bharat,et al.  Who links to whom: mining linkage between Web sites , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[12]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[13]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[14]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[15]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[16]  Wei Wang,et al.  Mining protein family specific residue packing patterns from protein structure graphs , 2004, RECOMB.

[17]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[18]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[19]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.

[20]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[21]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[22]  Abraham Kandel,et al.  Multi-lingual Detection of Terrorist Content on the Web , 2006, WISI.

[23]  Mohammad Al Hasan,et al.  ORIGAMI: Mining Representative Orthogonal Graph Patterns , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[24]  Dong Li,et al.  A Data Mining Approach to Generating Network Attack Graph for Intrusion Prediction , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[25]  Alexandre Termier,et al.  DIGDAG, a First Algorithm to Mine Closed Frequent Embedded Sub-DAGs , 2007, MLG.

[26]  Concettina Guerra,et al.  A review on models and algorithms for motif discovery in protein-protein interaction networks. , 2008, Briefings in functional genomics & proteomics.

[27]  Koji Tsuda,et al.  Iterative Subgraph Mining for Principal Component Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[28]  Yannis Manolopoulos,et al.  SkyGraph: an algorithm for important subgraph discovery in relational graphs , 2008, Data Mining and Knowledge Discovery.

[29]  Yuhua Li,et al.  A Directed Labeled Graph Frequent Pattern Mining Algorithm Based on Minimum Code , 2009, 2009 Third International Conference on Multimedia and Ubiquitous Engineering.

[30]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[31]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[32]  Reid Andersen,et al.  A local algorithm for finding dense subgraphs , 2007, TALG.

[33]  Stephen Muggleton,et al.  Inductive Logic Programming , 2011, Lecture Notes in Computer Science.

[34]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..