LOD Lab: Experiments at LOD Scale

Contemporary Semantic Web research is in the business of optimizing algorithms for only a handful of datasets such as DBpedia, BSBM, DBLP and only a few more. This means that current practice does not generally take the true variety of Linked Data into account. With hundreds of thousands of datasets out in the world today the results of Semantic Web evaluations are less generalizable than they should and — this paper argues — can be. This paper describes LOD Lab: a fundamentally different evaluation paradigm that makes algorithmic evaluation against hundreds of thousands of datasets the new norm. LOD Lab is implemented in terms of the existing LOD Laundromat architecture combined with the new open-source programming interface Frank that supports Web-scale evaluations to be run from the command-line. We illustrate the viability of the LOD Lab approach by rerunning experiments from three recent Semantic Web research publications and expect it will contribute to improving the quality and reproducibility of experimental work in the Semantic Web community. We show that simply rerunning existing experiments within this new evaluation paradigm brings up interesting research questions as to how algorithmic performance relates to (structural) properties of the data.

[1]  Frank van Harmelen,et al.  A Compact In-Memory Dictionary for RDF Data , 2015, ESWC.

[2]  Heiner Stuckenschmidt,et al.  Ontology Alignment Evaluation Initiative: Six Years of Experience , 2011, J. Data Semant..

[3]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[4]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[5]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[6]  Jürgen Umbrich,et al.  LDspider: An Open-source Crawling Framework for the Web of Linked Data , 2010, SEMWEB.

[7]  Stefan Schlobach,et al.  LOD Laundromat: A Uniform Way of Publishing Other People's Dirty Data , 2014, SEMWEB.

[8]  Heiko Paulheim,et al.  Adoption of the Linked Data Best Practices in Different Topical Domains , 2014, SEMWEB.

[9]  Ruben Verborgh,et al.  Linked Data-as-a-Service: The Semantic Web Redeployed , 2015, ESWC.

[10]  Josep-Lluís Larriba-Pey,et al.  The Linked Data Benchmark Council Project , 2013, Datenbank-Spektrum.

[11]  Jürgen Umbrich,et al.  SPARQL Web-Querying Infrastructure: Ready for Action? , 2013, SEMWEB.

[12]  Rik Van de Walle,et al.  Querying Datasets on the Web with High Availability , 2014, SEMWEB.

[13]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[14]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[15]  Wouter Beek,et al.  Frank: The LOD cloud at your fingertips? , 2015 .

[16]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.