Towards Graph Containment Search and Indexing

Given a set of model graphs D and a query graph q, containment search aims to find all model graphs g e D such that q contains g (q ⊇ g). Due to the wide adoption of graph models, fast containment search of graph data finds many applications in various domains. In comparison to traditional graph search that retrieves all the graphs containing q (q ⊆ g), containment search has its own indexing characteristics that have not yet been examined. In this paper, we perform a systematic study on these characteristics and propose a contrast subgraph-based indexing model, called cIndex. Contrast subgraphs capture the structure differences between model graphs and query graphs, and are thus perfect for indexing due to their high selectivity. Using a redundancy-aware feature selection process, cIndex can sort out a set of significant and distinctive contrast subgraphs and maximize its indexing capability. We show that it is NP-complete to choose the best set of indexing features, and our greedy algorithm can approximate the one-level optimal index within a ratio of 1-- 1/e. Taking this solution as a base indexing model, we further extend it to accommodate hierarchical indexing methodologies and apply data space clustering and sampling techniques to reduce the index construction time. The proposed methodology provides a general solution to containment search and indexing, not only for graphs, but also for any data with transitive relations as well. Experimental results on real test data show that cIndex achieves near-optimal pruning power on various containment search workloads, and confirms its obvious advantage over indices built for traditional graph search in this new scenario.

[1]  Horst Bunke,et al.  A decision tree approach to graph and subgraph isomorphism detection , 1999, Pattern Recognit..

[2]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[3]  Steven Gold,et al.  A Graduated Assignment Algorithm for Graph Matching , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[5]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[6]  Wojciech Szpankowski,et al.  Pairwise Local Alignment of Protein Interaction Networks Guided by Models of Evolution , 2005, RECOMB.

[7]  H. Bunke Graph Matching : Theoretical Foundations , Algorithms , and Applications , 2022 .

[8]  Philip S. Yu,et al.  ViCo: an adaptive distributed video correlation system , 2006, MM '06.

[9]  Hans-Arno Jacobsen,et al.  G-ToPSS: fast filtering of graph-based metadata , 2005, WWW '05.

[10]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[11]  James Bailey,et al.  Mining Minimal Contrast Subgraph Patterns , 2006, SDM.

[12]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[13]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[14]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[15]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[16]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[17]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[18]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[19]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[21]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[22]  King-Sun Fu,et al.  A Step Towards Unification of Syntactic and Statistical Pattern Recognition , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jignesh M. Patel,et al.  SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[24]  T. V. Lakshman,et al.  Gigabit rate packet pattern-matching using TCAM , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..

[25]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[26]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[27]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.