BitMat – Scalable Indexing and Querying of Large RDF Graphs ( Technical Report )

The growing size of Semantic Web data expressed in the form of Resource Description Framework (RDF) has made it necessary to develop effective ways of storing this data to save space and to query it in a scalable manner. SPARQL – the query language for RDF data – closely follows SQL syntax. As a natural consequence most of the RDF storage and querying engines are based on modern database storage and query optimization techniques. Previous work has tried to use vertical partitioning using column stores (C-Store, MonetDB) and 6-way indexing (RDF-3X, Hexastore) for storage and querying of RDF data. Although these approaches perform well for highly selective queries, for queries having low-selectivity triple patterns, scalability of the querying method and optimizations still remain a challenge. In this paper we present a new way of storing RDF graphs in run-length-encoded bit-vector format called BitMat, and we propose a novel two-phase SPARQL join query processing algorithm. In the first phase it prunes the candidate RDF triples, and in the next phase, it stitches the pruned RDF triples together to generate final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. Our evaluation shows that BitMat not only provides an efficient method of storage of the RDF graphs, but our join query processing algorithm scales well for low-selectivity join queries, where state-of-the-art RDF query processors face problems.

[1]  Richard Cyganiak,et al.  A relational algebra for SPARQL , 2005 .

[2]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[3]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[4]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[5]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[6]  Dimitris Papadias,et al.  Multiway spatial joins , 2001, ACM Trans. Database Syst..

[7]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[8]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[9]  Sunita Sarawagi Indexing OLAP Data , 1997, IEEE Data Eng. Bull..

[10]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[11]  Min Cai,et al.  RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network , 2004, WWW '04.

[12]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[13]  James A. Hendler,et al.  BitMat: A Main-memory Bit Matrix of RDF Triples for Conjunctive Triple Pattern Queries , 2008, SEMWEB.

[14]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[16]  Said Mirza Pahlevi,et al.  RDFCube: A P2P-Based Three-Dimensional Index for Structural Joins on Distributed Triple Stores , 2005, DBISP2P.

[17]  Philip A. Bernstein,et al.  Power of Natural Semijoins , 1981, SIAM J. Comput..

[18]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[20]  Theodore Johnson,et al.  Performance Measurements of Compressed Bitmap Indices , 1999, VLDB.