Top-k subgraph matching query in a large graph

Recently, due to its wide applications, subgraph search has attracted a lot of attention from database and data mining community. Sub-graph search is defined as follows: given a query graph Q, we report all data graphs containing Q in the database. However, there is little work about sub-graph search in a single large graph, which has been used in many applications, such as biological network and social network. In this paper, we address top-k sub-graph matching query problem, which is defined as follows: given a query graph Q, we locate top-k matchings of Q in a large data graph G according to a score function. The score function is defined as the sum of the pairwise similarity between a vertex in Q and its matching vertex in G. Specifically, we first design a balanced tree (that is G-Tree) to index the large data graph. Then, based on G-Tree, we propose an efficient query algorithm (that is Ranked Matching algorithm). Our extensive experiment results show that, due to efficiency of pruning strategy, given a query with up to 20 vertices, we can locate the top-100 matchings in less than 10 seconds in a large data graph with 100K vertices. Furthermore, our approach outperforms the alternative method by orders of magnitude.

[1]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[3]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[5]  Sing-Hoi Sze,et al.  Path Matching and Graph Matching in Biological Networks , 2007, J. Comput. Biol..

[6]  Scott Fortin The Graph Isomorphism Problem , 1996 .

[7]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[8]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[9]  Jonathan Pevsner,et al.  Basic Local Alignment Search Tool (BLAST) , 2005 .

[10]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[11]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[13]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[14]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[15]  Vipin Kumar,et al.  Multilevel Graph Partitioning Schemes , 1995, ICPP.

[16]  Cristina G. Fernandes,et al.  Motif Search in Graphs: Application to Metabolic Networks , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  Jiawei Han,et al.  Community Mining from Multi-relational Networks , 2005, PKDD.

[21]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Thomas R. Hagadone,et al.  Molecular Substructure Similarity Searching: Efficient Retrieval in Two-Dimensional Structure Databases. , 1993 .

[23]  Philip S. Yu,et al.  GString: A Novel Approach for Efficient Search in Graph Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[25]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[26]  Roded Sharan,et al.  Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks , 2006, J. Comput. Biol..