An Efficient Graph Indexing Method

Graphs are popular models for representing complex structure data and similarity search for graphs has become a fundamental research problem. Many techniques have been proposed to support similarity search based on the graph edit distance. However, they all suffer from certain drawbacks: high computational complexity, poor scalability in terms of database size, or not taking full advantage of indexes. To address these problems, in this paper, we propose SEGOS, an indexing and query processing framework for graph similarity search. First, an effective two-level index is constructed off-line based on sub-unit decomposition of graphs. Then, a novel search strategy based on the index is proposed. Two algorithms adapted from TA and CA methods are seamlessly integrated into the proposed strategy to enhance graph search. More specially, the proposed framework is easy to be pipelined to support continuous graph pruning. Extensive experiments are conducted on two real datasets to evaluate the effectiveness and scalability of our approaches.

[1]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[2]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[3]  Bin Wang,et al.  VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams , 2007, VLDB.

[4]  Dennis Shasha,et al.  GraphGrep: A fast and universal method for querying graphs , 2002, Object recognition supported by user interaction for service robots.

[5]  Ge Yu,et al.  Efficiently Indexing Large Sparse Graphs for Similarity Search , 2012, IEEE Transactions on Knowledge and Data Engineering.

[6]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[8]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[9]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[10]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Horst Bunke,et al.  A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[13]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[14]  Salih O. Duffuaa,et al.  A Linear Programming Approach for the Weighted Graph Matching Problem , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Anthony K. H. Tung,et al.  Similarity evaluation on tree-structured data , 2005, SIGMOD '05.

[16]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[17]  Luis Gravano,et al.  Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[18]  Ismail Hakki Toroslu,et al.  Incremental assignment problem , 2007, Inf. Sci..

[19]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[21]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[22]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[23]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[24]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[25]  Kaspar Riesen,et al.  Fast Suboptimal Algorithms for the Computation of Graph Edit Distance , 2006, SSPR/SPR.

[26]  Gabriel Valiente,et al.  A graph distance metric combining maximum common subgraph and minimum common supergraph , 2001, Pattern Recognit. Lett..

[27]  Alfred O. Hero,et al.  A binary linear programming formulation of the graph edit distance , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[29]  Jan O. Pedersen,et al.  Optimization for dynamic inverted index maintenance , 1989, SIGIR '90.