Towards Emulation of Large Scale Complex Network Workloads on Graph Databases with XGDBench

Graph database systems are getting a lot of attention in recent times from the big data management community due to their efficiency in graph data storage andpowerful graph query specification abilities. In this paper we present a methodology for modeling workload spikes in a graph database system using a scalable benchmarking framework called XGDBench. We describe how two main types of workload spikes called data spikes and volume spikes can be implemented in the context of graph databases by considering realworld workload traces and empirical evidence.We implemented these features on XGDBench which we developed using X10. We validated these features by running workloads on Titan which is a popular open source distributed graph database server.We observed the ability of XGDBench in generating realistic workload spikes on Titan. The distributed architecture of XGDBench promotes implementation of such techniques efficiently through utilization of computing power offered by distributed memory compute clusters.

[1]  David A. Bader,et al.  A performance evaluation of open source graph databases , 2014, PPAA '14.

[2]  Pangfeng Liu,et al.  Data Replication for Distributed Graph Processing , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[3]  Ladislav Hluchý,et al.  Benchmarking Traversal Operations over Graph Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[4]  Qi Zhang,et al.  Efficient and Customizable Data Partitioning Framework for Distributed Big RDF Data Processing in the Cloud , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[5]  Toyotaro Suzumura,et al.  Graph database benchmarking on cloud environments with XGDBench , 2013, Automated Software Engineering.

[6]  Norbert Martínez-Bazan,et al.  DEX: A high-performance graph database management system , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[7]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[8]  Haixun Wang,et al.  Managing and mining large graphs: systems and implementations , 2012, SIGMOD Conference.

[9]  Aleksa Vukotic,et al.  Neo4j in Action , 2014 .

[10]  Josep-Lluís Larriba-Pey,et al.  Benchmarking database systems for social network applications , 2013, GRADES.

[11]  Pavel A Pevzner,et al.  How to apply de Bruijn graphs to genome assembly. , 2011, Nature biotechnology.

[12]  Guillermo Palma,et al.  Choosing Between Graph Databases and RDF Engines for Consuming and Mining Linked Data , 2013, COLD.

[13]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[14]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[15]  Toyotaro Suzumura,et al.  XGDBench: A benchmarking platform for graph stores in exascale clouds , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[16]  Alexandru Iosup,et al.  Benchmarking graph-processing platforms: a vision , 2014, ICPE.

[17]  Pangfeng Liu,et al.  Distributed Graph Database for Large-Scale Social Computing , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[18]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[19]  Jure Leskovec,et al.  Multiplicative Attribute Graph Model of Real-World Networks , 2010, Internet Math..

[20]  Ellen R. Bergeman,et al.  Graph database systems , 1995 .

[21]  John A. Miller,et al.  Techniques for Graph Analytics on Big Data , 2013, 2013 IEEE International Congress on Big Data.

[22]  Michael I. Jordan,et al.  Characterizing, modeling, and generating workload spikes for stateful services , 2010, SoCC '10.

[23]  Josep-Lluís Larriba-Pey,et al.  Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark , 2010, WAIM Workshops.

[24]  René Peinl,et al.  Performance of graph query languages: comparison of cypher, gremlin and native access in Neo4j , 2013, EDBT '13.