Efficient processing of XML twig patterns with parent child edges: a look-ahead approach

With the growing importance of semi-structure data in information exchange, much research has been done to provide an effective mechanism to match a twig query in an XML database. A number of algorithms have been proposed recently to process a twig query holistically. Those algorithms are quite efficient for quires with only ancestor-descendant edges. But for queries with mixed ancestor-descendant and parent-child edges, the previous approaches still may produce large intermediate results, even when the input and output size are more manageable. To overcome this limitation, in this paper, we propose a novel holistic twig join algorithm, namely <i>TwigStackList</i>. Our main technique is to look-ahead read some elements in input data steams and cache limited number of them to <i>lists</i> in the main memory. The number of elements in any list is bounded by the length of the longest path in the XML document. We show that <i>TwigStackList</i> is I/O optimal for queries with only ancestor-descendant relationships below branching nodes. Further, even when queries contain parent-child relationship below branching nodes, the set of intermediate results in <i>TwigStackList</i> is guaranteed to be a subset of that in previous algorithms. We complement our experimental results on a range of real and synthetic data to show the significant superiority of <i>TwigStackList</i> over previous algorithms for queries with <i>parent</i>-<i>child</i> relationships.

[1]  Jignesh M. Patel,et al.  Structural join order selection for XML query optimization , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[2]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[3]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[4]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[5]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[6]  Derick Wood,et al.  On the Optimality of Holistic Algorithms for Twig Queries , 2003, DEXA.

[7]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[8]  Susan B. Davidson,et al.  BLAS: an efficient XPath processing system , 2004, SIGMOD '04.

[9]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[10]  Hongjun Lu,et al.  Efficient Processing of Twig Queries with OR-Predicates. , 2004, ACM SIGMOD Conference.

[11]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).