Runtime Characterization of Triple Stores

As the Semantic Web becomes mainstream, the performance of triple stores becomes increasingly important. Up until now, there have been various benchmarks and experiments that have attempted to evaluate the response time and query throughput of individual stores to show the weaknesses and strengths of triple store implementation. However, these evaluations have primarily focused on the application level and have not sufficiently investigated system-level aspects to discover performance inhibitors and bottlenecks. In this paper, we are proposing metrics based on a systematic study of the impact of triple store implementation on the underlying platform. We choose some popular triple stores as use cases, and perform our experiments on a standard (128GB RAM, 12 cores) and an enterprise platform (768GB RAM, 40cores). Through detailed time cost and system consumption measures of queries derived from the Berlin SPARQL Benchmark (BSBM), we describe the dynamics and behaviors of query execution across these systems. The collected data provides insight into different triple store implementation as well as an understanding of performance differences between the two platforms. The results obtained help in the identification of performance bottlenecks in existing triple stores implementations which may be useful in future design efforts for Linked Data processing.

[1]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[2]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[3]  Markus Schneider,et al.  Data Structures for Databases , 2004, Handbook of Data Structures and Applications.

[4]  Manolis Koubarakis,et al.  SPARQL Query Optimization on Top of DHTs , 2010, SEMWEB.

[5]  Manfred Hauswirth,et al.  Scalable distributed indexing and query processing over Linked Data , 2012, J. Web Semant..

[6]  Bo Hu,et al.  An Evaluation of RDF Storage Systems for Large Data Applications , 2005, 2005 First International Conference on Semantics, Knowledge and Grid.

[7]  Jens Lehmann,et al.  LinkedGeoData: A core for a web of spatial open data , 2012, Semantic Web.

[8]  Tom Scott,et al.  Use of Semantic Web technologies on the BBC Web Sites , 2010, Linking Enterprise Data.

[9]  V. S. Subrahmanian,et al.  DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases , 2009, SEMWEB.

[10]  Bernhard Haslhofer,et al.  Europeana RDF Store Report , 2011 .

[11]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[12]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[13]  Padmashree Ravindra,et al.  RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web , 2009, SEMWEB.

[14]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[15]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[16]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[17]  John Allsop,et al.  Microformats: Empowering Your Markup for Web 2.0 , 2007 .

[18]  Andy Seaborne,et al.  Clustered TDB: A Clustered Triple Store for Jena , 2008 .

[19]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[20]  Kurt Rohloff,et al.  An Evaluation of Triple-Store Technologies for Large Data Stores , 2007, OTM Workshops.

[21]  Barry Bishop,et al.  OWLIM: A family of scalable semantic repositories , 2011, Semantic Web.

[22]  Georg Lausen,et al.  An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario , 2008, SEMWEB.

[23]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.