TGDB: towards a benchmark for graph databases

Graph data has become an important representation for many analytical applications, ranging from social network analysis to biological data computation, to ontologies in the semantic web. Recently, many graph databases have been proposed to process and analyze graph data. We can categorize these into two main approaches: one is to build a layer of graph data model on top of an existing database (e.g., key-value store); and the second is to build a specialized native data processing substrate for processing graph data. Consequently, data scientists at present have a variety of choices and approaches to choose amongst. This requires having an approach to evaluate and assess these approaches, to select the one that suits best their situation. We propose TGDB, the Toronto Graph Database Benchmark. TGDB has query workload and real-world datasets to evaluate the performance of targeted systems. We choose three graph databases that have different system architectures and evaluate their performance against TGDB.

[1]  David J. DeWitt,et al.  The 007 Benchmark , 1993, SIGMOD '93.

[2]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  Michael Grossniklaus,et al.  Towards a benchmark for graph data management and processing , 2013 .

[5]  Ladislav Hluchý,et al.  Benchmarking Traversal Operations over Graph Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[6]  Christos Faloutsos,et al.  RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Josep-Lluís Larriba-Pey,et al.  Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark , 2010, WAIM Workshops.

[8]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[9]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[10]  Christos Faloutsos,et al.  RTG: a recursive realistic graph generator using random typing , 2009, Data Mining and Knowledge Discovery.

[11]  B. Bollobás The evolution of random graphs , 1984 .

[12]  Christos Faloutsos,et al.  Weighted graphs and disconnected components: patterns and a generator , 2008, KDD.

[13]  Katja Losemann,et al.  Foundations of regular expressions in XML schema languages and SPARQL , 2012, PhD '12.

[14]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[15]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Salim Jouili,et al.  An Empirical Comparison of Graph Databases , 2013, 2013 International Conference on Social Computing.

[17]  Josep-Lluís Larriba-Pey,et al.  A Discussion on the Design of Graph Database Benchmarks , 2010, TPCTC.

[18]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[19]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[20]  Marcelo Arenas,et al.  Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard , 2012, WWW.

[21]  R. G. G. Cattell,et al.  Object operations benchmark , 1992, TODS.

[22]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.