Rewriting XPath Queries using View Intersections: Tractability versus Completeness

The standard approach for optimization of XPath queries by rewriting using views techniques consists in navigating inside a view's output, thus allowing the usage of only one view in the rewritten query. Algorithms for richer classes of XPath rewritings, using intersection or joins on node identifiers, have been proposed, but they either lack completeness guarantees, or require additional information about the data. We identify the tightest restrictions under which an XPath can be rewritten in polynomial time using an intersection of views and propose an algorithm that works for any documents or type of identifiers. As a side-effect, we analyze the complexity of the related problem of deciding if an XPath with intersection can be equivalently rewritten as one without intersection or union. We extend our formal study of the view-based rewriting problem for XPath by describing also (i) algorithms for more complex rewrite plans, with no limitations on the number of intersection and navigation steps inside view outputs they employ, and (ii) adaptations of our techniques to deal with XML documents without persistent node Ids, in the presence of XML keys. Complementing our computational complexity study, we describe a proof-of-concept implementation of our techniques and possible choices that may speed up execution in practice, regarding how rewrite plans are built, tested and executed. We also give a thorough experimental evaluation of these techniques, focusing on scalability and the running time improvements achieved by the execution of view-based plans.

[1]  Paul J. Walmsley,et al.  XML Schema Part 0: Primer Second Edition , 2004 .

[2]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[3]  Marc Gyssens,et al.  A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization , 2009, Comput. J..

[4]  Sven Groppe,et al.  XPath Query Simplification with regard to the Elimination of Intersect and Except Operators , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[5]  Alin Deutsch,et al.  MARS: A System for Publishing XML from Mixed and Redundant Storage , 2003, VLDB.

[6]  Laks V. S. Lakshmanan,et al.  Answering tree pattern queries using views , 2006, VLDB.

[7]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[8]  Serge Abiteboul,et al.  Reasoning about XML update constraints , 2007, J. Comput. Syst. Sci..

[9]  Manolis Gergatsoulis,et al.  Union rewritings for XPath fragments , 2011, IDEAS '11.

[10]  Laks V. S. Lakshmanan,et al.  Tree pattern query minimization , 2002, The VLDB Journal.

[11]  Jan Hidders Satisfiability of XPath Expressions , 2003, DBPL.

[12]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[13]  Sven Hartmann,et al.  Efficient reasoning about a robust XML key fragment , 2009, TODS.

[14]  Wenfei Fan,et al.  Reasoning about Keys for XML , 2001, DBPL.

[15]  Mong-Li Lee,et al.  Efficient Mining of XML Query Patterns for Caching , 2003, VLDB.

[16]  Ioana Manolescu,et al.  Structured Materialized Views for XML Queries , 2007, VLDB.

[17]  Bruno Courcelle,et al.  Graph Rewriting: An Algebraic and Logic Approach , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[18]  Jun Gao,et al.  MQTree Based Query Rewriting over Multiple XML Views , 2007, DEXA.

[19]  Xin Wang,et al.  Answering graph pattern queries using views , 2006, 2014 IEEE 30th International Conference on Data Engineering.

[20]  Z. Meral Özsoyoglu,et al.  Rewriting XPath Queries Using Materialized Views , 2005, VLDB.

[21]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22]  Elke A. Rundensteiner,et al.  XCache: XQuery-based Caching System , 2002, WebDB.

[23]  Ioana Manolescu,et al.  Efficient XQuery rewriting using multiple views , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[24]  Jörg Flum,et al.  Query evaluation via tree-decompositions , 2001, JACM.

[25]  Alin Deutsch,et al.  Rewriting nested XML queries using nested views , 2006, SIGMOD Conference.

[26]  Vasilis Vassalos,et al.  On Equivalence and Rewriting of XPath Queries Using Views under DTD Constraints , 2011, DEXA.

[27]  Thomas Schwentick,et al.  On the complexity of XPath containment in the presence of disjunction, DTDs, and variables , 2006, Log. Methods Comput. Sci..

[28]  Carsten Lutz,et al.  The complexity of query containment in expressive fragments of XPath 2.0 , 2007, PODS.

[29]  Alin Deutsch,et al.  Querying XML data sources that export very large sets of views , 2011, TODS.

[30]  Shuigeng Zhou,et al.  A Theoretic Framework for Answering XPath Queries Using Views , 2005, XSym.

[31]  Alin Deutsch,et al.  A chase too far , 2000, SIGMOD 2000.

[32]  Dan Suciu,et al.  Containment and equivalence for a fragment of XPath , 2004, JACM.

[33]  Jiang Li,et al.  Answering tree pattern queries using views: a revisit , 2011, EDBT/ICDT '11.

[34]  Wenfei Fan,et al.  Rewriting Regular XPath Queries on XML Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[35]  Kam-Fai Wong,et al.  Multiple Materialized View Selection for XPath Query Rewriting , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[36]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[37]  Hamid Pirahesh,et al.  A Framework for Using Materialized XPath Views in XML Query Processing , 2004, VLDB.

[38]  Xiaoying Wu,et al.  Answering XML queries using materialized views revisited , 2009, CIKM.

[39]  Gabriel M. Kuper,et al.  Structural properties of XPath fragments , 2003, Theor. Comput. Sci..

[40]  Michael Benedikt,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[41]  Ioana Manolescu,et al.  Materialized view selection for XQuery workloads , 2012, SIGMOD Conference.

[42]  Georg Gottlob,et al.  Monadic queries over tree-structured data , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[43]  Dan Suciu,et al.  Query Caching and View Selection for XML Databases , 2005, VLDB.

[44]  Alin Deutsch,et al.  XPath Rewriting Using Multiple Views: Achieving Completeness and Efficiency , 2008, WebDB.