HBase System-Based Distributed Framework for Searching Large Graph Databases

In recent years, graphs have become popular in a variety of domains, with the volume of graph databases increasing rapidly. Therefore, large amounts of graph data need to be indexed, queried, and stored. To handle problems on graph querying and storing for large graph databases, we present a distributed graph searching framework based on the well-known storage system HBase. We improve the querying and indexing methods in GiS [4] to process the querying problem in a distributed environment. To ensure that graph data are well distributed, we design a distributed graph indexing technique that uses a line graph signature to index graphs in HBase. Experiments in both real and synthetic databases demonstrate that the proposed framework is an efficient distributed solution for querying sub graphs in a large volume of graph data.

[1]  Robin J. Wilson Introduction to Graph Theory , 1974 .

[2]  Jianzhong Li,et al.  A novel approach for efficient supergraph query processing on graph databases , 2009, EDBT '09.

[3]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[5]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[6]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[7]  A. Broder Some applications of Rabin’s fingerprinting method , 1993 .

[8]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[9]  Hang Lau,et al.  A Java Library of Graph Algorithms and Optimization (Discrete Mathematics and Its Applications) , 2006 .

[10]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[11]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[13]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[14]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[15]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[16]  Joseph M. Hellerstein,et al.  THE RD-TREE: AN INDEX STRUCTURE FOR SETS , 1997 .

[17]  Praveen R. Rao,et al.  A tool for fast indexing and querying of graphs , 2011, WWW.