A Pattern-Based Approach for Efficient Query Processing over RDF Data

The recent prevalence of Linked Data attracts research interest towards the efficiency of query execution over the web of data. Search and query engines crawl and index triples into a centralized repository and queries are executed locally. It has been shown in various literatures that the performance bottleneck of large scale query execution lies in joins and unions. Based on the observation that a large part of join operations result in a much smaller binding set which can be precomputed and stored, we propose to augment RDF indexes to store the bindings of complex patterns and exploit these patterns to enhance performance. In addition to the index, we also introduce two strategies of selecting these patterns: one depends on developed heuristic rules and the other employs query history to optimize time-space ratio. Our empirical study demonstrates the proposed pattern index outperforms traditional triple index by up to three orders of magnitude while keeping the overhead low.

[1]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[2]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[3]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .

[4]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[5]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[6]  J. Paredaens,et al.  Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS 1998 : Seattle, Washington, June 1-3, 1998 , 1998, SIGMOD 1998.

[7]  Kevin Wilkinson,et al.  Jena Property Table Implementation , 2006 .

[8]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[9]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[10]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[11]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.

[12]  Daniel J. Abadi,et al.  SW-Store: a vertically partitioned DBMS for Semantic Web data management , 2009, The VLDB Journal.

[13]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[15]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[16]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[17]  Claudio Gutiérrez,et al.  The Expressive Power of SPARQL , 2008, SEMWEB.

[18]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[19]  James A. Hendler,et al.  The Semantic Web — ISWC 2002 , 2002, Lecture Notes in Computer Science.

[20]  Eugene Inseok Chong,et al.  An Efficient SQL-based RDF Querying Scheme , 2005, VLDB.

[21]  Steffen Staab,et al.  The Semantic Web - ISWC 2008, 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008. Proceedings , 2008, SEMWEB.

[22]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.