WaterFowl, a Compact, Self-indexed RDF Store with Inference-enabled Dictionaries

In this paper, we present a novel approach { called WaterFowl { for the storage of RDF triples that addresses some key issues in the contexts of big data and the Semantic Web. The architecture of our prototype, largely based on the use of succinct data structures, enables the representation of triples in a self-indexed, compact manner without requiring decompression at query answering time. Moreover, it is adapted to eciently support RDF and RDFS entailment regimes thanks to an optimized encoding of ontology concepts and properties that does not require a complete inference materialization or extensive query rewriting algorithms. This approach implies to make a distinction between the terminological and the assertional components of the knowledge base early in the process of data preparation, i:e:, preprocessing the data before storing it in our structures. The paper describes the complete architecture of this system and presents some preliminary results obtained from evaluations conducted on our rst prototype.

[1]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[2]  Boris Motik,et al.  Bridging the gap between OWL and relational databases , 2007, WWW '07.

[3]  Riccardo Rosati,et al.  Improving Query Answering over DL-Lite Ontologies , 2010, KR.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[6]  Vassilis Christophides,et al.  Heuristics-based query optimisation for SPARQL , 2012, EDBT '12.

[7]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[8]  Dominique Revuz Dictionnaires et lexiques. Méthodes et algorithmes , 1991 .

[9]  Giuseppe Ottaviano,et al.  The wavelet trie: maintaining an indexed sequence of strings in compressed space , 2012, PODS '12.

[10]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[11]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[12]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[13]  Thomas Schwentick,et al.  Rewriting Ontological Queries into Small Nonrecursive Datalog Programs , 2011, Description Logics.

[14]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[15]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[16]  Miguel A. Martínez-Prieto,et al.  Compact Representation of Large RDF Data Sets for Publishing and Exchange , 2010, SEMWEB.

[17]  Boris Motik,et al.  Efficient Query Answering for OWL 2 , 2009, SEMWEB.

[18]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[19]  Miguel A. Martínez-Prieto,et al.  Exchange and Consumption of Huge RDF Data , 2012, ESWC.

[20]  Diego Calvanese,et al.  High Performance Query Answering over DL-Lite Ontologies , 2012, KR.

[21]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[22]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.