Querying Wikidata: Comparing SPARQL, Relational and Graph Databases

In this paper, we experimentally compare the efficiency of various database engines for the purposes of querying the Wikidata knowledge-base, which can be conceptualised as a directed edge-labelled graph where edges can be annotated with meta-information called qualifiers. We take two popular SPARQL databases (Virtuoso, Blazegraph), a popular relational database (PostgreSQL), and a popular graph database (Neo4J) for comparison and discuss various options as to how Wikidata can be represented in the models of each engine. We design a set of experiments to test the relative query performance of these representations in the context of their respective engines. We first execute a large set of atomic lookups to establish a baseline performance for each test setting, and subsequently perform experiments on instances of more complex graph patterns based on real-world examples. We conclude with a summary of the strengths and limitations of the engines observed.

[1]  Misha Mehra,et al.  Semantic Web Applications , 2011 .

[2]  Bryan B. Thompson,et al.  The Bigdata® RDF Graph Database , 2014, Linked Data Management.

[3]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[4]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[5]  Dave Reynolds,et al.  Supporting Scalable, Persistent Semantic Web Applications , 2003, IEEE Data Eng. Bull..

[6]  Guillermo Palma,et al.  Choosing Between Graph Databases and RDF Engines for Consuming and Mining Linked Data , 2013, COLD.

[7]  Barry Bishop,et al.  OWLIM: A family of scalable semantic repositories , 2011, Semantic Web.

[8]  Michael Stonebraker,et al.  The POSTGRES next generation database management system , 1991, CACM.

[9]  Pablo de la Fuente,et al.  An Empirical Study of Real-World SPARQL Queries , 2011, ArXiv.

[10]  Olaf Hartig,et al.  Foundations of an Alternative Approach to Reification in RDF , 2014, ArXiv.

[11]  N. Shadbolt,et al.  4store: The Design and Implementation of a Clustered RDF Store , 2009 .

[12]  Markus Krötzsch,et al.  Reifying RDF: What Works Well With Wikidata? , 2015, SSWS@ISWC.

[13]  Muhammad Saleem,et al.  LSQ: The Linked SPARQL Queries Dataset , 2015, SEMWEB.

[14]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[15]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[16]  Michael Günther,et al.  Introducing Wikidata to the Linked Data Web , 2014, SEMWEB.

[17]  Olaf Hartig,et al.  Reconciliation of RDF* and Property Graphs , 2014, ArXiv.

[18]  Gerhard Weikum,et al.  x-RDF-3X , 2010, Proc. VLDB Endow..

[19]  Amit P. Sheth,et al.  Don't like RDF reification?: making statements about statements using singleton property , 2014, WWW.