论文信息 - Efficient query processing on graph databases

Efficient query processing on graph databases

We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*-index, to solve this problem. The cost of processing a subgraph query using most existing indexes mainly consists of two parts: the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*-index to minimize these two costs as follows. FG*-index consists of three components: the FG-index, the feature-index, and the FAQ-index. First, the FG-index employs the concept of Frequent subGraph (FG) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FG-queries. We can enlarge the set of FG-queries so that more queries can be answered without candidate verification; however, a larger set of FG-queries implies a larger FG-index and hence the index probing cost also increases. We propose the feature-index to reduce the index probing cost. The feature-index uses features to filter false results that are matched in the FG-index, so that we can quickly find the truly matching graphs for a query. For processing non-FG-queries, we propose the FAQ-index, which is dynamically constructed from the set of Frequently Asked non-FG-Queries (FAQs). Using the FAQ-index, verification is not required for processing FAQs and only a small number of candidates need to be verified for processing non-FG-queries that are not frequently asked. Finally, a comprehensive set of experiments verifies that query processing using FG*-index is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable.

[1] Philip S. Yu,et al. Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[2] Yehuda Koren,et al. Measuring and extracting proximity in networks , 2006, KDD '06.

[3] Jiawei Han,et al. gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4] Dennis Shasha,et al. Algorithmics and applications of tree and graph searching , 2002, PODS.

[5] Philip S. Yu,et al. Graph indexing based on discriminative frequent structure analysis , 2005, TODS.

[6] Wilfred Ng,et al. Correlation search in graph databases , 2007, KDD '07.

[7] Wilfred Ng,et al. XQzip: Querying Compressed XML Using Structural Indexing , 2004, EDBT.

[8] Wilfred Ng,et al. Effective elimination of redundant association rules , 2007, Data Mining and Knowledge Discovery.

[9] Wilfred Ng,et al. Maintaining frequent closed itemsets over a sliding window , 2008, Journal of Intelligent Information Systems.

[10] Jeffrey F. Naughton,et al. Covering indexes for branching path queries , 2002, SIGMOD '02.

[11] Shijie Zhang,et al. TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12] Gerhard Weikum,et al. ACM Transactions on Database Systems , 2005 .

[13] Jiong Yang,et al. SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[14] Hongjun Lu,et al. False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams , 2004, VLDB.

[15] Wilfred Ng,et al. A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.

[16] Wei Wang,et al. Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17] RalfHiutmut Gtiting,et al. GraphDB : Modeling and Querying Graphs in Databases , 1998 .

[18] Wilfred Ng,et al. \delta-Tolerance Closed Frequent Itemsets , 2006, Sixth International Conference on Data Mining (ICDM'06).

[19] M. Tamer Özsu,et al. FIX: feature-based indexing technique for XML documents , 2006, VLDB.

[20] Christos Faloutsos,et al. Fast discovery of connection subgraphs , 2004, KDD.

[21] Christos Faloutsos,et al. Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[22] Roy Goldman,et al. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[23] Wei Wang,et al. Mining protein family specific residue packing patterns from protein structure graphs , 2004, RECOMB.

[24] Christos Faloutsos,et al. Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[25] Wilfred Ng,et al. An Efficient Index Lattice for XML Query Evaluation , 2007, DASFAA.

[26] Lawrence B. Holder,et al. Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[27] Dan Suciu,et al. Index Structures for Path Expressions , 1999, ICDT.

[28] Takashi Washio,et al. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[29] Philip S. Yu,et al. GString: A Novel Approach for Efficient Search in Graph Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30] Lukasz Golab,et al. Issues in data stream management , 2003, SGMD.

[31] Wilfred Ng,et al. Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[32] Wilfred Ng,et al. Efficient Correlation Search from Graph Databases , 2008, IEEE Transactions on Knowledge and Data Engineering.

[33] Ralf Hartmut Güting,et al. GraphDB: Modeling and Querying Graphs in Databases , 1994, VLDB.

[34] Philip S. Yu,et al. Substructure similarity search in graph databases , 2005, SIGMOD '05.

[35] Stephen A. Cook,et al. The complexity of theorem-proving procedures , 1971, STOC.

[36] Jiawei Han,et al. CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[37] Andrew Lim,et al. D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[38] Srinath Srinivasa,et al. A Platform Based on the Multi-dimensional Data Model for Analysis of Bio-Molecular Structures , 2003, VLDB.

[39] Ambuj K. Singh,et al. Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).