Optimizing Sorting and Duplicate Elimination in XQuery Path Expressions

XQuery expressions can manipulate two kinds of order: document order and sequence order. While the user can impose or observe the order of items within a sequence, the results of path expressions must always be returned in document order. Correctness can be obtained by inserting explicit (and expensive) operations to sort and remove duplicates after each XPath step. However, many such operations are redundant. In this paper, we present a systematic approach to remove unnecessary sorting and duplicate elimination operations in path expressions in XQuery 1.0. The technique uses an automaton-based algorithm which we have applied successfully to path expressions within a complete XQuery implementation. Experimental results show that the algorithm detects and eliminates most redundant sorting and duplicate elimination operators and is very effective on common XQuery path expressions.

[1]  Sebastian Maneth,et al.  Efficient Memory Representation of XML Documents , 2005, DBPL.

[2]  Allen Goldberg,et al.  Stream processing , 1984, LFP '84.

[3]  Jan Hidders,et al.  E cient XPath Axis Evaluation for DOM Data Structures , 2004 .

[4]  Sven Helmer,et al.  Optimized translation of XPath into algebraic expressions parameterized by programs containing navigational primitives , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[5]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[6]  Jan Hidders,et al.  Avoiding Unnecessary Ordering Operations in XPath , 2003, DBPL.

[7]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Greg J. Badros JavaML: a markup language for Java source code , 2000, Comput. Networks.

[9]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[10]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[11]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[12]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[13]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[14]  Gabriel M. Kuper,et al.  Structural Properties of XPath Fragments , 2003, ICDT.

[15]  Torsten Grust,et al.  Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps , 2003, VLDB.

[16]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[17]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .