Graph Database Indexing Using Structured Graph Decomposition

We introduce a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries. The index is comprised of two major data structures. The primary structure is a directed acyclic graph which contains a node for each of the unique, induced subgraphs of the database graphs. The secondary structure is a hash table which cross-indexes each subgraph for fast isomorphic lookup. In order to create a hash key independent of isomorphism, we utilize a code-based canonical representation of adjacency matrices, which we have further refined to improve computation speed. We validate the concept by demonstrating its effectiveness in answering queries for two practical datasets. Our experiments show that for subgraph isomorphism queries, our method outperforms existing methods by more than an order of magnitude.

[1]  Jacques Cohen,et al.  Computers and biology , 2001, CACM.

[2]  Wei Wang,et al.  Comparing Graph Representations of Protein Structure for Mining Family-Specific Residue-Based Packing Motifs , 2005, J. Comput. Biol..

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[5]  Sriram Raghavan,et al.  Representing Web graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Horst Bunke,et al.  A decision tree approach to graph and subgraph isomorphism detection , 1999, Pattern Recognit..

[8]  Öjvind Johansson,et al.  Graph Decomposition Using Node Labels , 2001 .

[9]  Kaizhong Zhang,et al.  Finding Patterns in Three-Dimensional Graphs: Algorithms and Applications to Scientific Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[10]  Shmuel Friedland,et al.  On the graph isomorphism problem , 2008, ArXiv.

[11]  Srinath Srinivasa,et al.  LWI and Safari: A New Index Structure and Query Model for Graph Databases , 2005, COMAD.

[12]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[13]  Srinath Srinivasa,et al.  GRACE: A Graph Database System , 2005 .

[14]  R. Bone Discovery , 1938, Nature.

[15]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[16]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Wei Wang,et al.  Mining protein family specific residue packing patterns from protein structure graphs , 2004, RECOMB.

[18]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[19]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[21]  David Eppstein,et al.  The Polyhedral Approach to the Maximum Planar Subgraph Problem: New Chances for Related Problems , 1994, GD.

[22]  J. Snoeyink,et al.  USING FAST SUBGRAPH ISOMORPHISM CHECKING FOR PROTEIN FUNCTIONAL ANNOTATION USING SCOP AND GENE ONTOLOGY , 2004 .

[23]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[24]  Lawrence B. Holder,et al.  Journal of Graph Algorithms and Applications Algorithm and Experiments in Testing Planar Graphs for Isomorphism , 2022 .

[25]  Philip S. Yu,et al.  Graph indexing based on discriminative frequent structure analysis , 2005, TODS.