Fast Structural Join with a Location Function

A structural join evaluates structural relationship (parent-child or ancestor-descendant) between xml elements. It serves as an important computation unit in xml pattern matching, such as twig joins. There exists many work on efficient structural joins. In particular, indexes can expedite structural joins by skipping unmatchable elements. A typical use of indexes is to retrieve, for a given element, all its ancestor (or descendant) elements from an indexed set. However we observed two possible limitations with such index probes, namely false hit and false locate. A false hit means that an index probe touches unnecessary data besides real results; a false locate stands for a (wasted) probe that has zero answers. Obviously false hit and false locate can affect negatively the efficiency of structural joins. In this paper, we challenge ourselves to develop new structural join algorithm with no false hit and no false locate. We illustrate that R−Tree has the no false hit property (in contrast to B+-Tree) and hence is a good candidate for our algorithm. For no false locate, we propose a new function called Location which tells the probing points that will result in matches. We design and implement the Location function using a space-efficient structure, and present our algorithm using R−Tree together with the Location function. Extensive experiments show the efficiency of our algorithm.

[1]  Jeffrey F. Naughton,et al.  Updates for Structure Indexes , 2002, VLDB.

[2]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[3]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[4]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.