A survey of frequent subgraph mining algorithms

Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues.

[1]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[2]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[3]  Shijie Zhang,et al.  RAM: Randomized Approximate Graph Mining , 2008, SSDBM.

[4]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[5]  Prabhakar Raghavan,et al.  Mining the Link Structure of the World Wide Web , 1998 .

[6]  Jeffrey Xu Yu,et al.  Efficient Discovery of Frequent Correlated Subgraph Pairs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7]  Mario Gerla,et al.  Aggregated Multicast – A Comparative Study , 2002, Cluster Computing.

[8]  Shirish Tatikonda,et al.  TRIPS and TIDES: new algorithms for tree mining , 2006, CIKM '06.

[9]  James A. McHugh,et al.  Algorithmic Graph Theory , 1986 .

[10]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[11]  Robert M. Haralick,et al.  Structural Descriptions and Inexact Matching , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Taku Kudo,et al.  Clustering graphs by weighted substructure mining , 2006, ICML.

[13]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[14]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[15]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[17]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[18]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[19]  Ron Shamir,et al.  Faster subtree isomorphism , 1997, Proceedings of the Fifth Israeli Symposium on Theory of Computing and Systems.

[20]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[21]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[22]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[23]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[24]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  J. Kleinberg,et al.  Authoritative Soueces in a Hyper-linked Environment , 1998, SODA 1998.

[26]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[27]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[28]  D. Matula Subtree Isomorphism in O(n5/2) , 1978 .

[29]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[30]  Ulrik Brandes,et al.  Biological Networks , 2013, Handbook of Graph Drawing and Visualization.

[31]  Sanjoy Paul,et al.  Multicasting on the Internet and its Applications , 1998, Springer US.

[32]  Falk Schreiber,et al.  Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks , 2005, Trans. Comp. Sys. Biology.

[33]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[34]  Mario Vento,et al.  An Improved Algorithm for Matching Large Graphs , 2001 .

[35]  Pawan Kumar,et al.  Notice of Violation of IEEE Publication Principles The Anatomy of a Large-Scale Hyper Textual Web Search Engine , 2009 .

[36]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[37]  Derek G. Corneil,et al.  The graph isomorphism disease , 1977, J. Graph Theory.

[38]  Jiawei Han,et al.  gApprox: Mining Frequent Approximate Patterns from a Massive Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[39]  M. H. Margahny,et al.  FAST ALGORITHM FOR MINING ASSOCIATION RULES , 2014 .

[40]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[41]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[42]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[43]  Robin Wilson,et al.  Graphs and Applications_ An Introductory Approach , 2001 .

[44]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[45]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[46]  Tharam S. Dillon,et al.  X3-Miner: Mining Patterns from XML Database , 2005 .

[47]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[48]  Hiroyuki Kawano,et al.  AMIOT: induced ordered tree mining in tree-structured databases , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[49]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[50]  Frank Harary,et al.  A Procedure for Clique Detection Using the Group Matrix , 1957 .

[51]  Tomonobu Ozaki,et al.  Mining Correlated Subgraphs in Graph Databases , 2008, PAKDD.

[52]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.

[53]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[54]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[55]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[56]  Douglas C. Schmidt,et al.  A Fast Backtracking Algorithm to Test Directed Graphs for Isomorphism Using Distance Matrices , 1976, J. ACM.

[57]  Philip S. Yu,et al.  Searching Substructures with Superimposed Distance , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[58]  Toon Calders,et al.  Anti-monotonic Overlap-Graph Support Measures , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[59]  Hannu Toivonen,et al.  Finding Frequent Substructures in Chemical Compounds , 1998, KDD.

[60]  Philip S. Yu,et al.  gPrune: A Constraint Pushing Framework for Graph Pattern Mining , 2007, PAKDD.

[61]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[62]  William J. Christmas,et al.  Structural Matching in Computer Vision Using Probabilistic Relaxation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Chen Wang,et al.  Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining , 2004, PAKDD.

[64]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[65]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[66]  Giuseppe Di Fatta,et al.  High Performance Subgraph Mining in Molecular Compounds , 2005, HPCC.

[67]  Yun Chi,et al.  Indexing and mining free trees , 2003, Third IEEE International Conference on Data Mining.

[68]  Stefan Kramer,et al.  Frequent free tree discovery in graph data , 2004, SAC '04.

[69]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[70]  Shmuel Friedland,et al.  On the graph isomorphism problem , 2008, ArXiv.

[71]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[72]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[73]  Thorsten Meinl,et al.  Graph based molecular data mining - an overview , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[74]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[75]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[76]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[77]  J. A. Bondy,et al.  Independent Sets and Cliques , 1976 .

[78]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[79]  Jianyong Wang,et al.  CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[80]  Gabriel Valiente,et al.  Algorithms on Trees and Graphs , 2002, Springer Berlin Heidelberg.

[81]  Sen Zhang,et al.  Unordered tree mining with applications to phylogeny , 2004, Proceedings. 20th International Conference on Data Engineering.

[82]  Frank Harary,et al.  Graph Theory , 2016 .

[83]  Kamalakar Karlapalem,et al.  MARGIN: Maximal Frequent Subgraph Mining , 2006, ICDM.

[84]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[85]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[86]  Mario Vento,et al.  A Performance Comparison of Five Algorithms for Graph Isomorphism , 2001 .

[87]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[88]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[89]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[90]  Jianyong Wang,et al.  Coherent closed quasi-clique discovery from large dense graph databases , 2006, KDD '06.

[91]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[92]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[93]  Horst Bunke,et al.  Inexact graph matching for structural pattern recognition , 1983, Pattern Recognit. Lett..

[94]  Ehud Gudes,et al.  Discovering Frequent Graph Patterns Using Disjoint Paths , 2006, IEEE Transactions on Knowledge and Data Engineering.

[95]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[96]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[97]  Jiawei Han,et al.  On effective presentation of graph patterns: a structural representative approach , 2008, CIKM '08.

[98]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[99]  Tyng-Luh Liu,et al.  Approximate tree matching and shape similarity , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[100]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[101]  Horst Bunke,et al.  A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[102]  Jeffrey Xu Yu,et al.  Fast Frequent Free Tree Mining in Graph Databases , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[103]  Katharina Jahn,et al.  Optimizing gSpan for Molecular Datasets , 2005 .

[104]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[105]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[106]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[107]  Yun Chi,et al.  Canonical forms for labelled trees and their applications in frequent subtree mining , 2005, Knowledge and Information Systems.

[108]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[109]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2004, IEEE International Parallel and Distributed Processing Symposium.

[110]  Moon-Jung Chung,et al.  O(n^(2.55)) Time Algorithms for the Subgraph Homeomorphism Problem on Trees , 1987, J. Algorithms.

[111]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[112]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[113]  Bruno R. Preiss,et al.  Data Structures and Algorithms with Object-Oriented Design Patterns in Java , 1999 .

[114]  Yun Chi,et al.  CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees , 2004, PAKDD.

[115]  G. Karypis,et al.  Frequent sub-structure-based approaches for classifying chemical compounds , 2005, Third IEEE International Conference on Data Mining.

[116]  Jeffrey Xu Yu,et al.  Mining Closed Frequent Free Trees in Graph Databases , 2007, DASFAA.

[117]  Ehud Gudes,et al.  Support measures for graph data* , 2006, Data Mining and Knowledge Discovery.

[118]  Xiaodi Huang,et al.  Clustering graphs for visualization via node similarities , 2006, J. Vis. Lang. Comput..

[119]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[120]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[121]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[122]  Sen Zhang,et al.  Mining Frequent Agreement Subtrees in Phylogenetic Databases , 2006, SDM.

[123]  Luigi Pontieri,et al.  Mining Constrained Graphs: The Case of Workflow Systems , 2004, Constraint-Based Mining and Inductive Databases.

[124]  Mario Vento,et al.  Graph matching: a fast algorithm and its evaluation , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[125]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[126]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[127]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[128]  George Karypis,et al.  Discovering frequent geometric subgraphs , 2007, Inf. Syst..

[129]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[130]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[131]  George Karypis,et al.  Frequent Substructure-Based Approaches for Classifying Chemical Compounds , 2005, IEEE Trans. Knowl. Data Eng..

[132]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[133]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[134]  Tharam S. Dillon,et al.  IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding , 2006, PAKDD.

[135]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[136]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[137]  Robert E. Tarjan,et al.  Isomorphism of Planar Graphs , 1972, Complexity of Computer Computations.

[138]  Tao Jiang,et al.  On the Complexity of Comparing Evolutionary Trees , 1996, Discret. Appl. Math..

[139]  Panos M. Pardalos,et al.  The maximum clique problem , 1994, J. Glob. Optim..

[140]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[141]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[142]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[143]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[144]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[145]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..