Updating Graph Indices with a One-Pass Algorithm

Indices are commonly built into graph databases in order to support fast searches. Any given graph database and the distribution of queries will change over time. Therefore, the cost of processing queries using a static graph index increases because the index is built to optimize old snapshots of the database. There is growing research interest in determining how to update a graph index with the purpose of adapting to database and query changes. Updating features in a graph index is typically an NP-hard problem. In addition, because the features are chosen from a large number of frequent subgraphs, a multi-pass algorithm is not scalable to big datasets. In order to address this issue, we propose a time-efficient one-pass algorithm that is designed to update a graph index by scanning each frequent subgraph at most once. The algorithm replaces a feature with a new subgraph if the latter is ``better" than the former one. We use the branch and bound technique to skip subgraphs that cannot outperform any of the features in the graph index. We further use a decomposed index and reduce the space complexity from O(|G||Q|) to O(|G| + |Q|), where G is database graphs and Q is a query workload. Through the empirical study, we show that the one-pass algorithm is 5--100 times faster than all previous algorithms for updating graph indices. In addition, the one-pass algorithm guarantees the return of a close to optimum solution. Our experiments show that when the one-pass algorithm is used to update an index, the query-processing speed is $1$--$2$ times faster than that of other cutting-edge indices, i.e., the FGindex and the gIndex.

[1]  Lise Getoor,et al.  On Maximum Coverage in the Streaming Model & Application to Multi-topic Blog-Watch , 2009, SDM.

[2]  C. Lee Giles,et al.  Independent informative subgraph mining for graph information retrieval , 2009, CIKM.

[3]  Jianzhong Li,et al.  A novel approach for efficient supergraph query processing on graph databases , 2009, EDBT '09.

[4]  Geoff Holmes,et al.  Mining frequent closed graphs on evolving data streams , 2011, KDD.

[5]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[6]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[7]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[8]  Ravi Kumar,et al.  Max-cover in map-reduce , 2010, WWW '10.

[9]  Philip S. Yu,et al.  Near-optimal Supervised Feature Selection among Frequent Subgraphs , 2009, SDM.

[10]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[11]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[12]  R. Allen Miller,et al.  A database system of mechanical components based on geometric and topological similarity. Part I: representation , 2003, Comput. Aided Des..

[13]  Philip S. Yu,et al.  Discriminative frequent subgraph mining with optimality guarantees , 2010, Stat. Anal. Data Min..

[14]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[15]  Wei Wang,et al.  Graph classification based on pattern co-occurrence , 2009, CIKM.

[16]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[17]  Wei Wang,et al.  GAIA: graph classification using evolutionary computation , 2010, SIGMOD Conference.

[18]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[19]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[20]  Graham Cormode,et al.  Set cover algorithms for very large datasets , 2010, CIKM.

[21]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[22]  Vangelis Th. Paschos,et al.  Online maximum k-coverage∗ , 2010 .

[23]  C. Lee Giles,et al.  Iterative Graph Feature Mining for Graph Indexing , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[24]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[25]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[26]  Prasenjit Mitra,et al.  Lindex: a lattice-based index for graph databases , 2012, The VLDB Journal.

[27]  Jeffrey Xu Yu,et al.  iGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques , 2010, Proc. VLDB Endow..

[28]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.