Partition Based Path Join Algorithms for XML Data

Path expression is an important component in querying XML data. The extended preorder numbering scheme enables us to quickly determine the ancestor-descendant relationship between elements in the hierarchy of XML data. Using the numbering scheme, a path expression can be evaluated by join operations to avoid potentially high cost of tree traversals. In this paper, we first formulate XML path queries as range-point join queries. Then we discuss the partition based algorithms that can utilize the range containment property to efficiently process the range-point join queries. Under the partition based framework, we propose three algorithms, namely Descendant partition join, Segment-tree partition join and Ancestor Link partition join, which can be chosen by a query optimizer for different input data characteristics. The experimental results show that the partition based algorithms can make better use of the buffer memory than sort-merge algorithms, and the proposed Ancestor Link join algorithm yields the best performance by using small in-memory data structures and by taking advantage of unevenly sized inputs.

[1]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[2]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[3]  Rajeev Motwani,et al.  Random sampling for histogram construction: how much is enough? , 1998, SIGMOD '98.

[4]  Christian S. Jensen,et al.  Efficient evaluation of the valid-time natural join , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[5]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[6]  Jeffrey F. Naughton,et al.  Efficient Sampling Strategies for Relational Database Operations , 1993, Theor. Comput. Sci..

[7]  H. Gunadhi,et al.  Query processing algorithms for temporal intersection joins , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[8]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[9]  Goetz Graefe,et al.  Sort versus Hash Revisited , 1994, IEEE Trans. Knowl. Data Eng..

[10]  David J. DeWitt,et al.  An Evaluation of Non-Equijoin Algorithms , 1991, VLDB.

[11]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .