论文信息 - Compressed k2-Triples for Full-In-Memory RDF Engines

Compressed k2-Triples for Full-In-Memory RDF Engines

Current "data deluge" has flooded the Web of Data with very large RDF datasets. They are hosted and queried through SPARQL endpoints which act as nodes of a semantic net built on the principles of the Linked Data project. Although this is a realistic philosophy for global data publishing, its query performance is diminished when the RDF engines (behind the endpoints) manage these huge datasets. Their indexes cannot be fully loaded in main memory, hence these systems need to perform slow disk accesses to solve SPARQL queries. This paper addresses this problem by a compact indexed RDF structure (called k2-triples) applying compact k2-tree structures to the well-known vertical-partitioning technique. It obtains an ultra-compressed representation of large RDF graphs and allows SPARQL queries to be full-in-memory performed without decompression. We show that k2-triples clearly outperforms state-of-the-art compressibility and traditional vertical-partitioning query resolution, remaining very competitive with multi-index solutions.

Nieves R. Brisaboa | Miguel A. Martínez-Prieto | Javier D. Fernández | Sandra Álvarez-García

[1] Abraham Bernstein,et al. Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[2] Martin L. Kersten,et al. Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[3] Departamento de Computación,et al. Algorithms and Compressed Data Structures for Information Retrieval , 2011 .

[4] James A. Hendler,et al. Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[5] Tim Berners-Lee,et al. Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[6] Nieves R. Brisaboa,et al. A compact representation of graph databases , 2010, MLG '10.

[7] Miguel A. Martínez-Prieto,et al. Compact Representation of Large RDF Data Sets for Publishing and Exchange , 2010, SEMWEB.

[8] Gerhard Weikum,et al. The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[9] Daniel J. Abadi,et al. Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[10] Gonzalo Navarro,et al. k2-Trees for Compact Web Graph Representation , 2009, SPIRE.

[11] Nieves R. Brisaboa,et al. Compressed String Dictionaries , 2011, SEA.