MSP: Multiple Sub-graph Query Processing using Structure-based Graph Partitioning Strategy and Map-Reduce

Abstract In a distributed environment, the volume of graph database increases quickly because graphs emerge from several autonomous sources. Sub-graph query processing is a challenging problem in distributed environment. Centralized approaches proposed many algorithms, they mine frequent subgraphs from the graph database and construct an index which is very expensive. These algorithms require more number of database scans to mine frequent subgraphs and they use filter and verify approach, which requires many subgraph isomorphism tests. In this paper, we design a novel Map-Reduce based multiple subgraph query processing framework, namely MSP. MSP processes multiple graph queries using distributed index. The framework completely relies on the graph partition and indexing. Moreover, in order to improve its performance, we propose several solutions to balance the workload and reduce the size of Integrated Graph Index. We propose a structure-based partitioning technique and distributed way of building Integrated Graph Index. This work uses two Map-Reduce rounds, the first Map-Reduce round partitions the graphs and creating index for each partition, second Map-Reduce round processes sub-graph queries and index maintenance. A good partitioning will reduce the index size by distributing the load equally to the machines in the cluster and improves the performance of query evaluation. This graph partition and Integrated Graph Index reduces the search space of query graphs. Our approach allows to add data graphs incrementally to Integrated Graph Index while doing query processing. We experimentally show that our approach decreases remarkably the execution time and scales the subgraph query processing to large graph databases.

[1]  Mohammad Al Hasan,et al.  MIRAGE: An Iterative MapReduce based FrequentSubgraph Mining Algorithm , 2013, ArXiv.

[2]  Wilfred Ng,et al.  Fg-index: towards verification-free query processing on graph databases , 2007, SIGMOD '07.

[3]  Christos Faloutsos,et al.  PEGASUS: mining peta-scale graphs , 2011, Knowledge and Information Systems.

[4]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Philip S. Yu,et al.  Graph indexing based on discriminative frequent structure analysis , 2005, TODS.

[9]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[10]  Dennis Shasha,et al.  GraphGrep: A fast and universal method for querying graphs , 2002, Object recognition supported by user interaction for service robots.

[11]  Anna Lubiw,et al.  Some NP-Complete Problems Similar to Graph Isomorphism , 1981, SIAM J. Comput..

[12]  Euripides G. M. Petrakis,et al.  Similarity Searching in Medical Image Databases , 1997, IEEE Trans. Knowl. Data Eng..

[13]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[14]  Renzo Angles,et al.  A Comparison of Current Graph Database Models , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[15]  Jeffrey Xu Yu,et al.  Fast graph query processing with a low-cost index , 2011, The VLDB Journal.

[16]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[17]  Sabeur Aridhi,et al.  Density-based data partitioning strategy to approximate large-scale subgraph mining , 2012, Inf. Syst..

[18]  Ana Paula Appel,et al.  HADI: Mining Radii of Large Graphs , 2011, TKDD.

[19]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Lei Zou,et al.  A novel spectral coding in a large graph database , 2008, EDBT '08.

[22]  Philip S. Yu,et al.  Towards Graph Containment Search and Indexing , 2007, VLDB.

[23]  Philip S. Yu,et al.  GString: A Novel Approach for Efficient Search in Graph Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[25]  Shuigeng Zhou,et al.  Towards Efficient Subgraph Search in Cloud Computing Environments , 2011, DASFAA Workshops.

[26]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[27]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.