Compact Representation of Large RDF Data Sets for Publishing and Exchange

Increasingly huge RDF data sets are being published on the Web. Currently, they use different syntaxes of RDF, contain high levels of redundancy and have a plain indivisible structure. All this leads to fuzzy publications, inefficient management, complex processing and lack of scalability. This paper presents a novel RDF representation (HDT) which takes advantage of the structural properties of RDF graphs for splitting and representing, efficiently, three components of RDF data: Header, Dictionary and Triples structure. On-demand management operations can be implemented on top of HDT representation. Experiments show that data sets can be compacted in HDT by more than fifteen times the current naive representation, improving parsing and processing while keeping a consistent publication scheme. For exchanging, specific compression techniques over HDT improve current compression solutions.

[1]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[2]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[3]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[4]  Alberto O. Mendelzon,et al.  Foundations of Semantic Web databases , 2011, J. Comput. Syst. Sci..

[5]  Miguel A. Martínez-Prieto,et al.  RDF compression: basic approaches , 2010, WWW '10.

[6]  George McDaniel IBM dictionary of computing , 1994 .

[7]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[8]  Ravi Kumar,et al.  Compressed web indexes , 2009, WWW '09.

[9]  Valerie Illingworth,et al.  Dictionary of Computing , 1997 .

[10]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[11]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[12]  Nick Koudas,et al.  The design of a query monitoring system , 2009, TODS.

[13]  Marcelo Arenas,et al.  Semantics and Complexity of SPARQL , 2006, International Semantic Web Conference.

[14]  Vassilis Christophides,et al.  Ieee Transactions on Knowledge and Data Engineering on Graph Features of Semantic Web Schemas , 2022 .

[15]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[16]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[17]  Eugene Inseok Chong,et al.  An Efficient SQL-based RDF Querying Scheme , 2005, VLDB.

[18]  David Richard Clark,et al.  Compact pat trees , 1998 .

[19]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[20]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[21]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[22]  R. González,et al.  PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES , 2005 .