Covering Indexes for XML Queries: Bisimulation - Simulation = Negation

Tree Pattern Queries (TPQ), Branching Path Queries (BPQ), and Core XPath (CXPath) are subclasses of the XML query language XPath, TPQ ⊂ BPQ ⊂ CX Path ⊂ X Path. Let TPQ = TPQ+ ⊂ BPQ+ ⊂ CX Path+ ⊂ X Path+ denote the corresponding subclasses, consisting of queries that do not involve the boolean negation operator not in their predicates. Simulation and bisimulation are two different binary relations on graph vertices that have previously been studied in connection with some of these classes. For instance, TPQ queries can be minimized using simulation. Most relevantly, for an XML document, its bisimulation quotient is the smallest index that covers (i.e., can be used to answer) all BPQ queries. Our results are as follows: • A CXPath+ query can be evaluated on an XML document by computing the simulation of the query tree by the document graph. • For an XML document, its simulation quotient is the smallest covering index for BPQ+. This, together with the previously-known result stated above, leads to the following: For BPQ covering indexes of XML documents, Bisimulation - Simulation = Negation. • For an XML document, its simulation quotient, with the idref edges ignored throughout, is the smallest covering index for TPQ. For any XML document, its simulation quotient is never larger than its bisimulation quotient; in some instances, it is exponentially smaller. Our last two results show that disallowing negation in the queries could substantially reduce the size of the smallest covering index.

[1]  Laks V. S. Lakshmanan,et al.  Minimization of tree pattern queries , 2001, SIGMOD '01.

[2]  David Park,et al.  Concurrency and Automata on Infinite Sequences , 1981, Theoretical Computer Science.

[3]  J W Ballard,et al.  Data on the web? , 1995, Science.

[4]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[5]  Bard Bloom,et al.  Transformational Design and Implementation of a New Efficient Solution to the Ready Simulation Problem , 1995, Sci. Comput. Program..

[6]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[7]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[8]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[9]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[10]  Prakash V. Ramanan,et al.  Efficient algorithms for minimizing tree pattern queries , 2002, SIGMOD '02.