Selectivity Estimation for SPARQL Triple Patterns with Shape Expressions

We optimize the evaluation of conjunctive SPARQL queries, on big RDF graphs, by taking advantage of ShEx schema constraints. Our optimization is based on computing ranks for query triple patterns, which indicates their order of execution. We first define a set of well-formed ShEx schemas, that possess interesting characteristics for SPARQL query optimization. We then define our optimization method by exploiting information extracted from a ShEx schema. The experimentations performed shows the advantages of applying our optimization on the top of an existing state-of-the-art query evaluation system.

[1]  François Goasdoué,et al.  CliqueSquare: efficient Hadoop-based RDF query processing , 2013 .

[2]  Ling Liu,et al.  Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning , 2013, Proc. VLDB Endow..

[3]  Giuseppe Castagna,et al.  Optimizing XML querying using type-based document projection , 2013, TODS.

[4]  Pierre Genevès,et al.  SPARQLGX: Efficient Distributed Evaluation of SPARQL with Apache Spark , 2016, International Semantic Web Conference.

[5]  Vassilis Christophides,et al.  Containment and Minimization of RDF/S Query Patterns , 2005, SEMWEB.

[6]  HyeongSik Kim,et al.  Type-based Semantic Optimization for Scalable RDF Graph Pattern Matching , 2017, WWW.

[7]  Harold R. Solbrig,et al.  Shape expressions: an RDF validation and transformation language , 2014, SEM '14.

[8]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[9]  Lei Zou,et al.  gStore: Answering SPARQL Queries via Subgraph Matching , 2011, Proc. VLDB Endow..

[10]  George H. L. Fletcher,et al.  gMark: Schema-Driven Generation of Graphs and Queries , 2015, IEEE Transactions on Knowledge and Data Engineering.

[11]  Pascal Hitzler,et al.  Logical Linked Data Compression , 2013, ESWC.

[12]  Iovka Boneva,et al.  Complexity and Expressiveness of ShEx for RDF , 2015, ICDT.

[13]  Ioannis Konstantinou,et al.  H2RDF+: High-performance distributed joins over large-scale RDF graphs , 2013, 2013 IEEE International Conference on Big Data.

[14]  Peter A. Boncz,et al.  Deriving an Emergent Relational Schema from RDF Data , 2015, WWW.

[15]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[16]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.