A Discussion on the Design of Graph Database Benchmarks

Graph Database Management systems (GDBs) are gaining popularity. They are used to analyze huge graph datasets that are naturally appearing in many application areas to model interrelated data. The objective of this paper is to raise a new topic of discussion in the benchmarking community and allow practitioners having a set of basic guidelines for GDB benchmarking. We strongly believe that GDBs will become an important player in the market field of data analysis, and with that, their performance and capabilities will also become important. For this reason, we discuss those aspects that are important from our perspective, i.e. the characteristics of the graphs to be included in the benchmark, the characteristics of the queries that are important in graph analysis applications and the evaluation workbench.

[1]  David J. DeWitt,et al.  The oo7 Benchmark , 1993, SIGMOD Conference.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  R. G. G. Cattell,et al.  Object operations benchmark , 1992, TODS.

[4]  Josep-Lluís Larriba-Pey,et al.  Dex: high-performance exploration on large graphs for information retrieval , 2007, CIKM '07.

[5]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[6]  Amit P. Sheth,et al.  Ρ-Queries: enabling querying for semantic associations on the semantic web , 2003, WWW '03.

[7]  David A. Bader,et al.  Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[8]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[9]  M. Chein,et al.  Conceptual graphs: fundamental notions , 1992 .

[10]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[11]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[12]  Josep-Lluís Larriba-Pey,et al.  Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark , 2010, WAIM Workshops.

[13]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[14]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[15]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[16]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[17]  Tim O'Reilly,et al.  What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software , 2007 .

[18]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[19]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[20]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[21]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[22]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[23]  Christos Faloutsos,et al.  Weighted graphs and disconnected components: patterns and a generator , 2008, KDD.

[24]  David J. DeWitt,et al.  The 007 Benchmark , 1993, SIGMOD '93.

[25]  Dan Brickley,et al.  Resource description framework (RDF) schema specification , 1998 .

[26]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[27]  Paul Erdös,et al.  On random graphs, I , 1959 .