XPath query containment

Consider an XML publish-subscribe scenario with hundreds of subscribers and tens of thousands of XML documents to be delivered per day. Subscribers specify the documents in which they are interested in by means of XPath [8] expressions. If an expression matches a (part of a) document it is delivered to the subscriber. Naturally, it is desired that the decision to which subscriber a document must be sent should be taken quickly. Although the test whether a single XPath expression matches can be done in polynomial time, it is not efficient to test every such expression for every document. Fortunately, there is a partial order on expressions, i.e., for some expressions p, q it might hold that whenever a document matches p it also matches q (denoted p ⊆0 q). If we already know that a document matches p, we do not need to test q anymore, as it matches automatically. Correspondingly, if we know that q does not match then p will not match either. Hence, the inclusion structure of the XPath expressions should be computed in advance to decrease online computation time. This leads to the algorithmic problem of XPath Query Containment, i.e., checking whether p ⊆0 q (for a different, indexbased approach see, e.g., [6]). The main idea of this article is to describe some of the main algorithmic techniques that have been proposed for XPath Query Containment. These techniques are described in Section 5. Before that, in Sections 2 and 3 the basic definitions on XPath and the

[1]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[2]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[3]  Diego Calvanese,et al.  Decidable Containment of Recursive Queries , 2003, ICDT.

[4]  Peter T. Wood Minimising Simple XPath Expressions , 2001, WebDB.

[5]  Laks V. S. Lakshmanan,et al.  Tree pattern query minimization , 2002, The VLDB Journal.

[6]  Peter T. Wood,et al.  On the Equivalence of XML Patterns , 2000, Computational Logic.

[7]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[8]  Peter T. Wood,et al.  Containment for XPath Fragments under DTD Constraints , 2003, ICDT.

[9]  Maarten Marx,et al.  XPath with Conditional Axis Relations , 2004, EDBT.

[10]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[11]  Alex Thomo,et al.  Query containment and rewriting using views for regular path queries under constraints , 2003, PODS.

[12]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[13]  Alin Deutsch,et al.  Containment and Integrity Constraints for XPath , 2001, KRDB.

[14]  Gabriel M. Kuper,et al.  Structural Properties of XPath Fragments , 2003, ICDT.

[15]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[16]  Dan Suciu,et al.  Containment and equivalence for an XPath fragment , 2002, PODS.

[17]  Georg Gottlob,et al.  XPath processing in a nutshell , 2003, SIGMOD Rec..

[18]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[19]  Thomas Schwentick,et al.  XPath Containment in the Presence of Disjunction, DTDs, and Variables , 2003, ICDT.

[20]  Dan Suciu,et al.  Containment and equivalence for a fragment of XPath , 2004, JACM.

[21]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[22]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[23]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[24]  Georg Gottlob,et al.  XPath query evaluation: improving time and space efficiency , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[25]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.