Frequent subgraph mining in outerplanar graphs

In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we consider the class of outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for outerplanar graphs, and show that it works in incremental polynomial time for the practically relevant subclass of well-behaved outerplanar graphs, i.e., which have only polynomially many simple cycles. We evaluate the algorithm empirically on chemo- and bioinformatics applications.

[1]  Maurice Bruynooghe,et al.  An Efficiently Computable Graph-Based Metric for the Classification of Small Molecules , 2008, Discovery Science.

[2]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  John E. Hopcroft,et al.  Linear time algorithm for isomorphism of planar graphs (Preliminary Report) , 1974, STOC '74.

[4]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[5]  Hans L. Bodlaender,et al.  A Partial k-Arboretum of Graphs with Bounded Treewidth , 1998, Theor. Comput. Sci..

[6]  S. Mitchell Linear algorithms to recognize outerplanar and maximal outerplanar graphs , 1979 .

[7]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[8]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[9]  Stefan Wrobel,et al.  Relational Instance-Based Learning with Lists and Terms , 2001, Machine Learning.

[10]  H. P. Annales de l'Institut Henri Poincaré , 1931, Nature.

[11]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[13]  Robert E. Tarjan,et al.  Bounds on Backtrack Algorithms for Listing Cycles, Paths, and Spanning Trees , 1975, Networks.

[14]  Pavol Hell,et al.  List Homomorphisms to Reflexive Graphs , 1998, J. Comb. Theory, Ser. B.

[15]  Ron Shamir,et al.  Faster subtree isomorphism , 1997, Proceedings of the Fifth Israeli Symposium on Theory of Computing and Systems.

[16]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[17]  Peter F. Stadler,et al.  Minimal Cycle Bases of Outerplanar Graphs , 1998, Electron. J. Comb..

[18]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[19]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[20]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[21]  D. Matula Subtree Isomorphism in O(n5/2) , 1978 .

[22]  Luc De Raedt,et al.  Frequent Hypergraph Mining , 2006, ILP.

[23]  Maciej M. SysŁ The subgraph isomorphism problem for outerplanar graphs , 1982 .

[24]  L. Chua,et al.  Uniqueness of solution for nonlinear resistive circuits containing CCCS's or VCVS's whose controlling coefficients are finite , 1986 .

[25]  Yun Chi,et al.  Canonical forms for labelled trees and their applications in frequent subtree mining , 2005, Knowledge and Information Systems.

[26]  F. Harary,et al.  Planar Permutation Graphs , 1967 .

[27]  Toon Calders,et al.  Anti-monotonic Overlap-Graph Support Measures , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[28]  Mihalis Yannakakis,et al.  On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[29]  W. L. G. Koontz Economic evaluation of loop feeder relief alternatives , 1980, The Bell System Technical Journal.

[30]  George Karypis,et al.  Frequent substructure-based approaches for classifying chemical compounds , 2003, IEEE Transactions on Knowledge and Data Engineering.

[31]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[32]  Tamás Horváth,et al.  Cyclic Pattern Kernels Revisited , 2005, PAKDD.

[33]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[34]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[35]  Frank Harary,et al.  Graph Theory , 2016 .

[36]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[37]  G. Chartrand,et al.  Graphs with Forbidden Subgraphs , 1971 .

[38]  Andrzej Lingas Subgraph Isomorphism for Biconnected Outerplanar Graphs in Cubic Time , 1989, Theor. Comput. Sci..

[39]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.