PrefIndex: An Efficient Supergraph Containment Search Technique

Graphs are prevailingly used in many applications to model complex data structures. In this paper, we study the problem of super-graph containment search. To avoid the NP-complete subgraph isomorphism test, most existing works follow the filtering-verification framework and select graph-features to build effective indexes, which filter false results (graphs) before conducting the costly verification. However, searching features multiple times in the query graphs yields huge redundant computation, which leads to the emergence of the computation-sharing framework. This paper follows the roadmap of computation-sharing framework to efficiently process supergraph containment queries. Firstly, database graphs are clustered into disjoint groups for sharing the computation cost within each group. While it is shown NP-hard to maximize the computation-sharing benefits of a clustering, efficient algorithm is developed to approximate the optimal solution with an approximation factor of 1/2. A novel prefix-sharing indexing technique, PrefIndex, is then proposed based on which efficient query processing algorithm integrating both filtering and verification is developed. Finally, PrefIndex is enhanced with multi-level sharing and suffix-sharing to further avoid redundant computation. An extensive empirical study demonstrates the efficiency and scalability of our techniques which achieve orders of magnitudes of speed-up against the state-of-the-art techniques.

[1]  Horst Bunke,et al.  A decision tree approach to graph and subgraph isomorphism detection , 1999, Pattern Recognit..

[2]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[3]  Philip S. Yu,et al.  GString: A Novel Approach for Efficient Search in Graph Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[5]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Jeffrey Xu Yu,et al.  Fast Frequent Free Tree Mining in Graph Databases , 2006, ICDM Workshops.

[7]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[8]  Alberto Del Bimbo,et al.  Efficient Matching and Indexing of Graph Models in Content-Based Retrieval , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[10]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[12]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[16]  Jianzhong Li,et al.  A novel approach for efficient supergraph query processing on graph databases , 2009, EDBT '09.

[17]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.