Containment and equivalence for an XPath fragment

XPath is a simple language for navigating an XML document and selecting a set of element nodes. XPath expressions are used to query XML data, describe key constraints, express transformations, and reference elements in remote documents. This paper studies the containment and equivalence problems for a fragment of the XPath query language, with applications in all these contexts.In particular, we study a class of XPath queries that contain branching, label wildcards and can express descendant relationships between nodes. Prior work has shown that languages which combine any two of these three features have efficient containment algorithms. However, we show that for the combination of features, containment is coNP-complete. We provide a sound and complete EXPTIME algorithm for containment, and study parameterized PTIME special cases. While we identify two parameterized classes of queries for which containment can be decided efficiently, we also show that even with some bounded parameters, containment is coNP-complete. In response to these negative results, we describe a sound algorithm which is efficient for all queries, but may return false negatives in some cases.

[1]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[2]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[5]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[6]  S. Rao Kosaraju,et al.  Efficient tree pattern matching , 1989, 30th Annual Symposium on Foundations of Computer Science.

[7]  Oded Shmueli,et al.  Equivalence of DATALOG Queries is Undecidable , 1993, J. Log. Program..

[8]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[9]  Moshe Y. Vardi Why is Modal Logic So Robustly Decidable? , 1996, Descriptive Complexity and Finite Models.

[10]  Dan Suciu,et al.  Query containment for conjunctive queries with regular expressions , 1998, PODS.

[11]  Richard Cole,et al.  Tree pattern matching and subset matching in deterministic O(n log3 n)-time , 1999, SODA '99.

[12]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[13]  Philip Wadler,et al.  A formal semantics of patterns in XSLT , 2000 .

[14]  Philip Wadler,et al.  A Formal Semantics of Patterns in XSLT and XPath , 2000, Markup languages.

[15]  Peter T. Wood,et al.  On the Equivalence of XML Patterns , 2000, Computational Logic.

[16]  Wenfei Fan,et al.  Reasoning about Keys for XML , 2001, DBPL.

[17]  Peter T. Wood Minimising Simple XPath Expressions , 2001, WebDB.

[18]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[19]  Alin Deutsch,et al.  Containment and Integrity Constraints for XPath Fragments , 2001 .

[20]  Diego Calvanese,et al.  View-Based Query Answering and Query Containment over Semistructured Data , 2001, DBPL.

[21]  Alex C. Snoeren,et al.  Mesh-based content routing using XML , 2001, SOSP.

[22]  David Orchard,et al.  XML Linking Language (XLink) , 2001 .

[23]  Steven J. DeRose,et al.  XML Pointer Language (XPointer) Version 1. 0. World Wide Web Consortium, Working Draft WD - xptr - 2 , 2001 .