Static analysis and optimization of semantic web queries

Static analysis is a fundamental task in query optimization. In this paper we study static analysis and optimization techniques for SPARQL, which is the standard language for querying Semantic Web data. Of particular interest for us is the optionality feature in SPARQL. It is crucial in Semantic Web data management, where data sources are inherently incomplete and the user is usually interested in partial answers to queries. This feature is one of the most complicated constructors in SPARQL and also the one that makes this language depart from classical query languages such as relational conjunctive queries. We focus on the class of well-designed SPARQL queries, which has been proposed in the literature as a fragment of the language with good properties regarding query evaluation. We first propose a tree representation for SPARQL queries, called pattern trees, which captures the class of well-designed SPARQL graph patterns and which can be considered as a query execution plan. Among other results, we propose several transformation rules for pattern trees, a simple normal form, and study equivalence and containment. We also study the enumeration and counting problems for this class of queries.

[1]  Georg Lausen,et al.  An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario , 2008, SEMWEB.

[2]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[3]  Pablo de la Fuente,et al.  An Empirical Study of Real-World SPARQL Queries , 2011, ArXiv.

[4]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[5]  Axel Polleres,et al.  From SPARQL to rules (and back) , 2007, WWW '07.

[6]  Marcelo Arenas,et al.  Querying semantic web data with SPARQL , 2011, PODS.

[7]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[8]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[9]  Axel Polleres,et al.  On Blank Nodes , 2011, SEMWEB.

[10]  Reinhard Pichler,et al.  Tractable Counting of the Answers to Conjunctive Queries , 2013, AMW.

[11]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[12]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[13]  Francesco Scarcello,et al.  The power of tree projections: local consistency, greedy algorithms, and larger islands of tractability , 2010, PODS '10.

[14]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[15]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[16]  Werner Nutt,et al.  Querying Incomplete Information in Semistructured Data , 2002, J. Comput. Syst. Sci..

[17]  Heribert Vollmer,et al.  An Algebraic Approach to the Complexity of Generalized Conjunctive Queries , 2004, SAT.

[18]  Jörg Flum,et al.  The Parameterized Complexity of Counting Problems , 2004, SIAM J. Comput..

[19]  Ronald Fagin,et al.  The Closure of Monadic NP , 2000, J. Comput. Syst. Sci..

[20]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[21]  Francesco Scarcello,et al.  Structural tractability of enumerating CSP solutions , 2010, Constraints.

[22]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[23]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[24]  Georg Gottlob,et al.  Hypertree decompositions and tractable queries , 1998, PODS '99.

[25]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[26]  Stijn Vansummeren,et al.  What are real SPARQL queries like? , 2011, SWIM '11.

[27]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[28]  Arnaud Durand,et al.  On Acyclic Conjunctive Queries and Constant Delay Enumeration , 2007, CSL.

[29]  Jérôme Euzenat,et al.  SPARQL Query Containment Under SHI Axioms , 2012, AAAI.

[30]  Claudio Gutiérrez,et al.  The Expressive Power of SPARQL , 2008, SEMWEB.

[31]  Phokion G. Kolaitis,et al.  Subtractive reductions and complete problems for counting complexity classes , 2000 .

[32]  Jérôme Euzenat,et al.  PSPARQL Query Containment , 2011, DBPL.

[33]  Vassilis Christophides,et al.  Containment and Minimization of RDF/S Query Patterns , 2005, SEMWEB.

[34]  Anand Rajaraman,et al.  Conjunctive query containment revisited , 1997, Theor. Comput. Sci..

[35]  Alberto O. Mendelzon,et al.  Foundations of semantic web databases , 2004, PODS.

[36]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[37]  Yehoshua Sagiv,et al.  Full disjunctions: polynomial-delay iterators in action , 2006, VLDB.

[38]  Martin L. Kersten,et al.  Column-store support for RDF data management: not all swans are white , 2008, Proc. VLDB Endow..

[39]  Jörg Flum,et al.  Query evaluation via tree-decompositions , 2001, JACM.

[40]  Mihalis Yannakakis,et al.  On Generating All Maximal Independent Sets , 1988, Inf. Process. Lett..

[41]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[42]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[43]  Georg Gottlob,et al.  A Comparison of Structural CSP Decomposition Methods , 1999, IJCAI.

[44]  Jingren Zhou,et al.  View matching for outer-join views , 2006, The VLDB Journal.

[45]  Jorge Pérez,et al.  SPAM: A SPARQL Analysis and Manipulation Tool , 2012, Proc. VLDB Endow..

[46]  Marcelo Arenas,et al.  On the Semantics of SPARQL , 2009, Semantic Web Information Management.

[47]  Heribert Vollmer,et al.  The satanic notations , 1995, SIGACT News.