A Comparison of Data Structures to Manage URIs on the Web of Data

Uniform Resource Identifiers URIs are one of the corner stones of the Web; They are also exceedingly important on the Web of data, since RDF graphs and Linked Data both heavily rely on URIs to uniquely identify and connect entities. Due to their hierarchical structure and their string serialization, sets of related URIs typically contain a high degree of redundant information and are systematically dictionary-compressed or encoded at the back-end e.g., in the triple store. The paper represents, to the best of our knowledge, the first systematic comparison of the most common data structures used to encode URI data. We evaluate a series of data structures in term of their read/write performance and memory consumption.

[1]  Philippe Cudré-Mauroux,et al.  dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data , 2011, SEMWEB.

[2]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[3]  Paul T. Groth,et al.  TripleProv: efficient processing of lineage queries in a native RDF store , 2014, WWW.

[4]  Bo Hu,et al.  An Evaluation of RDF Storage Systems for Large Data Applications , 2005, 2005 First International Conference on Semantics, Knowledge and Grid.

[5]  Eugene Inseok Chong,et al.  An Efficient SQL-based RDF Querying Scheme , 2005, VLDB.

[6]  Bernhard Haslhofer,et al.  Europeana RDF Store Report , 2011 .

[7]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[8]  Guillaume Blin,et al.  A survey of RDF storage approaches , 2012, ARIMA J..

[9]  Gianluca Demartini,et al.  BowlognaBench - Benchmarking RDF Analytics , 2011, SIMPDA.

[10]  Paul T. Groth,et al.  Executing Provenance-Enabled Queries over Web Data , 2015, WWW.

[11]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[12]  Donald Kossmann,et al.  A Performance Evaluation of OID Mapping Techniques , 1995, VLDB.

[13]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[14]  Ted G. Lewis,et al.  Hash Table Methods , 1975, CSUR.

[15]  Miguel A. Martínez-Prieto,et al.  Compression of RDF dictionaries , 2012, SAC '12.

[16]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[17]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[18]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[19]  Ranjan Sinha,et al.  HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings , 2007, ACSC.

[20]  Jeff Heflin,et al.  An Evaluation of Knowledge Base Systems for Large OWL Datasets , 2004, SEMWEB.