Scalable join processing on very large RDF graphs

With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the fine-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans. We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways information passing between separate joins at query run-time, to provide highly effective filters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity estimations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outperforming the previously fastest systems by more than an order of magnitude.

[1]  Frank van Harmelen,et al.  Sesame: An Architecture for Storin gand Querying RDF Data and Schema Information , 2003, Spinning the Semantic Web.

[2]  Martin L. Kersten,et al.  Fast, Randomized Join-Order Selection - Why Use Transformations? , 1994, VLDB.

[3]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[4]  Vassilios Peristeras,et al.  Interlinking the Social Web with Semantics , 2008, IEEE Intelligent Systems.

[5]  Eugene Inseok Chong,et al.  An Efficient SQL-based RDF Querying Scheme , 2005, VLDB.

[6]  Alfons Kemper,et al.  Integrating semi-join-reducers into state-of-the-art query processors , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[8]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[9]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[10]  Daniel S. Weld,et al.  Automatically refining the wikipedia infobox ontology , 2008, WWW.

[11]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[12]  David Maier,et al.  Principles of dataspace systems , 2006, PODS '06.

[13]  John Riley,et al.  Tim Berners-Lee , 1998 .

[14]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[15]  Guido Moerkotte,et al.  Bypassing Joins in Disjunctive Queries , 1995, VLDB.

[16]  Guido Moerkotte,et al.  Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products , 2006, VLDB.

[17]  Amit P. Sheth,et al.  Estimating the cardinality of RDF graph patterns , 2007, WWW '07.

[18]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[19]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[20]  Henry Lieberman,et al.  Sesame: An Architecture for Storing and Querying RDF Data and Schema Information , 2005 .

[21]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[22]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[23]  Hamid Pirahesh,et al.  Cost-based optimization for magic: algebra and implementation , 1996, SIGMOD '96.

[24]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[25]  Dave Reynolds,et al.  Efficient RDF Storage and Retrieval in Jena2 , 2003, SWDB.

[26]  Frank Wm. Tompa,et al.  Optimal top-down join enumeration , 2007, SIGMOD '07.

[27]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[28]  Zachary G. Ives,et al.  Sideways Information Passing for Push-Style Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[29]  Hamid Pirahesh,et al.  Implementation of magic-sets in a relational database system , 1994, SIGMOD '94.

[30]  J. S. Saini,et al.  Adaptive Query Processing , 2006 .

[31]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[32]  V. S. Subrahmanian,et al.  GRIN: A Graph Based RDF Index , 2007, AAAI.