Left Bit Right: For SPARQL Join Queries with OPTIONAL Patterns (Left-outer-joins)

SPARQL basic graph pattern (BGP) (a.k.a. SQL inner-join) query optimization is a well researched area. However, optimization of OPTIONAL pattern queries (a.k.a. SQL left-outer-joins) poses additional challenges, due to the restrictions on the reordering of left-outer-joins. The occurrence of such queries tends to be as high as 50% of the total queries (e.g., DBPedia query logs). In this paper, we present Left Bit Right (LBR), a technique for well-designed nested BGP and OPTIONAL pattern queries. Through LBR, we propose a novel method to represent such queries using a graph of supernodes, which is used to aggressively prune the RDF triples, with the help of compressed indexes. We also propose novel optimization strategies -- first of a kind, to the best of our knowledge -- that combine together the characteristics of acyclicity of queries, minimality, and nullification, best-match operators. In this paper, we focus on OPTIONAL patterns without UNIONs or FILTERs, but we also show how UNIONs and FILTERs can be handled with our technique using a query rewrite. Our evaluation on RDF graphs of up to and over one billion triples, on a commodity laptop with 8 GB memory, shows that LBR can process well-designed low-selectivity complex queries up to 11 times faster compared to the state-of-the-art RDF column-stores as Virtuoso and MonetDB, and for highly selective queries, LBR is at par with them.

[1]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[2]  James A. Hendler,et al.  Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data , 2010, WWW '10.

[3]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[4]  Julian Dolby,et al.  Building an efficient RDF store over a relational database , 2013, SIGMOD '13.

[5]  César A. Galindo-Legaria,et al.  Outerjoins as disjunctions , 1994, SIGMOD '94.

[6]  Egor V. Kostylev,et al.  On the Semantics of SPARQL Queries with Optional Matching under Entailment Regimes , 2014, SEMWEB.

[7]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[8]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[9]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[10]  Martin Rennhackkamp Performance tuning , 1996 .

[11]  Jeffrey D. Ullman,et al.  Principles of database and knowledge-base systems, Vol. I , 1988 .

[12]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[13]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[14]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[15]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[16]  Pablo de la Fuente,et al.  An Empirical Study of Real-World SPARQL Queries , 2011, ArXiv.

[17]  Philip A. Bernstein,et al.  Power of Natural Semijoins , 1981, SIAM J. Comput..

[18]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[19]  Hamid Pirahesh,et al.  Canonical abstraction for outerjoin optimization , 2004, SIGMOD '04.

[20]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[21]  Richard Cyganiak,et al.  A relational algebra for SPARQL , 2005 .

[22]  Axel Polleres,et al.  From SPARQL to rules (and back) , 2007, WWW '07.

[23]  Shiyong Lu,et al.  Semantics preserving SPARQL-to-SQL translation , 2009, Data Knowl. Eng..

[24]  Hai Jin,et al.  TripleBit: a Fast and Compact System for Large Scale RDF Data , 2013, Proc. VLDB Endow..

[25]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[26]  Arnon Rosenthal,et al.  How to extend a conventional optimizer to handle one- and two-sided outerjoin , 1992, [1992] Eighth International Conference on Data Engineering.

[27]  Hamid Pirahesh,et al.  Using EELs, a practical approach to outerjoin and antijoin reordering , 2001, Proceedings 17th International Conference on Data Engineering.

[28]  Arnon Rosenthal,et al.  Outerjoin simplification and reordering for query optimization , 1997, TODS.

[29]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[30]  Stijn Vansummeren,et al.  What are real SPARQL queries like? , 2011, SWIM '11.

[31]  Jorge Pérez,et al.  Static analysis and optimization of semantic web queries , 2012, PODS '12.

[32]  Richard E. Schantz,et al.  Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store , 2011, DIDC '11.

[33]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[34]  Gerhard Weikum,et al.  Scalable join processing on very large RDF graphs , 2009, SIGMOD Conference.

[35]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..