Performance and Scalability of Indexed Subgraph Query Processing Methods

Graph data management systems have become very popular as graphs are the natural data model for many applications. One of the main problems addressed by these systems is subgraph query processing; i.e., given a query graph, return all graphs that contain the query. The naive method for processing such queries is to perform a subgraph isomorphism test against each graph in the dataset. This obviously does not scale, as subgraph isomorphism is NP-Complete. Thus, many indexing methods have been proposed to reduce the number of candidate graphs that have to underpass the subgraph isomorphism test. In this paper, we identify a set of key factors-parameters, that influence the performance of related methods: namely, the number of nodes per graph, the graph density, the number of distinct labels, the number of graphs in the dataset, and the query graph size. We then conduct comprehensive and systematic experiments that analyze the sensitivity of the various methods on the values of the key parameters. Our aims are twofold: first to derive conclusions about the algorithms' relative performance, and, second, to stress-test all algorithms, deriving insights as to their scalability, and highlight how both performance and scalability depend on the above factors. We choose six well-established indexing methods, namely Grapes, CT-Index, GraphGrepSX, gIndex, Tree+Δ, and gCode, as representative approaches of the overall design space, including the most recent and best performing methods. We report on their index construction time and index size, and on query processing performance in terms of time and false positive ratio. We employ both real and synthetic datasets. Specifically, four real datasets of different characteristics are used: AIDS, PDBS, PCM, and PPI. In addition, we generate a large number of synthetic graph datasets, empowering us to systematically study the algorithms' performance and scalability versus the aforementioned key parameters.

[1]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[2]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[4]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[5]  Philip S. Yu,et al.  Feature-based similarity search in graph structures , 2006, TODS.

[6]  Dennis Shasha,et al.  GRAPES: A Software for Parallel Searching on Biological Graphs Targeting Multi-Core Architectures , 2013, PloS one.

[7]  LiJianzhong,et al.  Efficient subgraph matching on billion node graphs , 2012, VLDB 2012.

[8]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[9]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[10]  Michael Lappe,et al.  CMView: Interactive contact map visualization and analysis , 2011, Bioinform..

[11]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  Prasenjit Mitra,et al.  Lindex: a lattice-based index for graph databases , 2012, The VLDB Journal.

[14]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Dennis Shasha,et al.  SING: Subgraph search In Non-homogeneous Graphs , 2010, BMC Bioinformatics.

[18]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Jeffrey Xu Yu,et al.  iGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques , 2010, Proc. VLDB Endow..

[20]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[21]  Philip S. Yu,et al.  CP-index: on the efficient indexing of large graphs , 2011, CIKM '11.

[22]  Timothy S Baker,et al.  Structure of decay-accelerating factor bound to echovirus 7: A virus-receptor complex , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Karsten Klein,et al.  CT-index: Fingerprint-based graph indexing combining cycles and trees , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[24]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[25]  Dennis Shasha,et al.  Enhancing Graph Database Indexing by Suffix Tree Structure , 2010, PRIB.

[26]  C. Lee Giles,et al.  Mining and Indexing Graphs for Supergraph Search , 2013, Proc. VLDB Endow..