Benchmarking Graph Database Backends - What Works Well with Wikidata?

Knowledge bases often utilize graphs as logical model. RDF-based knowledge bases (KB) are prime examples, as RDF (Resource Description Framework) does use graph as logical model. Graph databases are an emerging breed of NoSQL-type databases, offering graph as the logical model. Although there are specialized databases, the so-called triple stores, for storing RDF data, graph databases can also be promising candidates for storing knowledge. In this paper, we benchmark different graph database implementations loaded with Wikidata, a real-life, large-scale knowledge base. Graph databases come in all shapes and sizes, offer different APIs and graph models. Hence we used a measurement system, that can abstract away the API differences. For the modeling aspect, we made measurements with different graph encodings previously suggested in the literature, in order to observe the impact of the encoding aspect on the overall performance.  

[1]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[2]  Marko A. Rodriguez,et al.  Constructions from Dots and Lines , 2010, ArXiv.

[3]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[4]  Renzo Angles,et al.  A Comparison of Current Graph Database Models , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[5]  Markus Krötzsch,et al.  Reifying RDF: What Works Well With Wikidata? , 2015, SSWS@ISWC.

[6]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[7]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[8]  Josep-Lluís Larriba-Pey,et al.  Benchmarking database systems for social network applications , 2013, GRADES.

[9]  Michael Günther,et al.  Introducing Wikidata to the Linked Data Web , 2014, SEMWEB.

[10]  Atanas Kiryakov,et al.  Benchmarking RDF Query Engines: The LDBC Semantic Publishing Benchmark , 2016, BLINK@ISWC.

[11]  Tao Zhu,et al.  A survey of RDF management technologies and benchmark datasets , 2018, Journal of Ambient Intelligence and Humanized Computing.

[12]  Alexandru Iosup,et al.  LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[13]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[14]  Jimmy J. Lin,et al.  Do We Need Specialized Graph Databases?: Benchmarking Real-Time Social Networking Applications , 2017, GRADES@SIGMOD/PODS.

[15]  Amit P. Sheth,et al.  Don't like RDF reification?: making statements about statements using singleton property , 2014, WWW.

[16]  Carlos Rojas,et al.  Querying Wikidata: Comparing SPARQL, Relational and Graph Databases , 2016, SEMWEB.

[17]  Salim Jouili,et al.  An Empirical Comparison of Graph Databases , 2013, 2013 International Conference on Social Computing.

[18]  Olaf Hartig,et al.  Foundations of an Alternative Approach to Reification in RDF , 2014, ArXiv.