Mining and Indexing Graphs for Supergraph Search

We study supergraph search (SPS), that is, given a query graph q and a graph database G that contains a collection of graphs , return graphs that have q as a supergraph from G. SPS has broad applications in bioinformatics, cheminformatics and other scientific and commercial fields. Determining whether a graph is a subgraph (or supergraph) of another is an NP-complete problem. Hence, it is intractable to compute SPS for large graph databases. Two separate indexing methods, a "filter + verify"-based method and a "prefix-sharing"-based method, have been studied to efficiently compute SPS. To implement the above two methods, subgraph patterns are mined from the graph database to build an index. Those subgraphs are mined to optimize either the filtering gain or the prefix-sharing gain. However, no single subgraph-mining algorithm considers both gains. This work is the first one to mine subgraphs to optimize both the filtering gain and the prefix-sharing gain while processing SPS queries. First, we show that the subgraph-mining problem is NP-hard. Then, we propose two polynomial-time algorithms to solve the problem with an approximation ratio of 1-1/e and 1/4 respectively. In addition, we construct a lattice-like index, LW-index, to organize the selected subgraph patterns for fast index-lookup. Our experiments show that our approach improves the query processing time for SPS queries by a factor of 3 to 10.

[1]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[2]  P. Foggia,et al.  Performance evaluation of the VF graph matching algorithm , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[3]  Dawn Xiaodong Song,et al.  Malware Analysis with Tree Automata Inference , 2011, CAV.

[4]  Prasenjit Mitra,et al.  Lindex: a lattice-based index for graph databases , 2012, The VLDB Journal.

[5]  Zhengdong Huang,et al.  Automatic discovery of common design structures in CAD models , 2010, Comput. Graph..

[6]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[7]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[8]  C. Lee Giles,et al.  Iterative Graph Feature Mining for Graph Indexing , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[9]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[10]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[11]  C. Lee Giles,et al.  Independent informative subgraph mining for graph information retrieval , 2009, CIKM.

[12]  Ryutaro Ichise,et al.  Similarity search on supergraph containment , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[13]  Lise Getoor,et al.  On Maximum Coverage in the Streaming Model & Application to Multi-topic Blog-Watch , 2009, SDM.

[14]  Wilfred Ng,et al.  Efficient query processing on graph databases , 2009, TODS.

[15]  Surajit Chaudhuri,et al.  Exploiting statistics on query expressions for optimization , 2002, SIGMOD '02.

[16]  T. Meinl,et al.  The ParMol Package for Frequent Subgraph Mining , 2007, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[17]  Sebastian Nowozin,et al.  Weighted Substructure Mining for Image Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  Jianzhong Li,et al.  A novel approach for efficient supergraph query processing on graph databases , 2009, EDBT '09.

[20]  Wei Cai,et al.  Superstructure Searching Algorithm for Generic Reaction Retrieval , 2005, J. Chem. Inf. Model..

[21]  W. Todd Wipke,et al.  Artificial intelligence in organic synthesis. SST: starting material selection strategies. An application of superstructure search , 1984, J. Chem. Inf. Comput. Sci..

[22]  Xuemin Lin,et al.  PrefIndex: An Efficient Supergraph Containment Search Technique , 2010, SSDBM.

[23]  Jeffrey Xu Yu,et al.  iGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques , 2010, Proc. VLDB Endow..