Lightweighting the Web of Data through Compact RDF/HDT

The Web of Data is producing large RDF datasets from diverse fields. The increasing size of the data being published threatens to make these datasets hardly to exchange, index and consume. This scalability problem greatly diminishes the potential of interconnected RDF graphs. The HDT format addresses these problems through a compact RDF representation, that partitions and efficiently represents three components: Header (metadata), Dictionary (strings occurring in the dataset), and Triples (graph structure). This paper revisits the format and exploits the latest findings in triples indexing for querying, exchanging and visualizing RDF information at large scale.

[1]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[2]  Tim Berners-Lee,et al.  Linked data on the web (LDOW2008) , 2008, WWW.

[3]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[4]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[5]  Miguel A. Martínez-Prieto,et al.  Compressed q-Gram Indexing for Highly Repetitive Biological Sequences , 2010, 2010 IEEE International Conference on BioInformatics and BioEngineering.

[6]  R. González,et al.  PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES , 2005 .

[7]  Nieves R. Brisaboa,et al.  Compressed k2-Triples for Full-In-Memory RDF Engines , 2011, AMCIS.

[8]  Jirí Dokulil,et al.  RDF Visualization - Thinking Big , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[9]  Jean-Daniel Fekete Visualizing networks using adjacency matrices: Progresses and challenges , 2009, CAD/Graphics.

[10]  Ian Horrocks,et al.  The Semantic Web – ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I , 2010, SEMWEB.

[11]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[12]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[13]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[14]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[15]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[16]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[17]  Gonzalo Navarro,et al.  k2-Trees for Compact Web Graph Representation , 2009, SPIRE.

[18]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[19]  Miguel A. Martínez-Prieto,et al.  RDF Visualization using a Three-Dimensional Adjacency Matrix , 2011 .

[20]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[21]  Vassilis Christophides,et al.  Ieee Transactions on Knowledge and Data Engineering on Graph Features of Semantic Web Schemas , 2022 .

[22]  Nieves R. Brisaboa,et al.  A compact representation of graph databases , 2010, MLG '10.

[23]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[24]  Miguel A. Martínez-Prieto,et al.  Compact Representation of Large RDF Data Sets for Publishing and Exchange , 2010, SEMWEB.

[25]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[26]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.