TRACK : A Novel XML Join Algorithm for Efficient Processing Twig Queries

In order to find all occurrences of a tree/twig pattern in an XML database, a number of holistic twig join algorithms have been proposed. However, most of these algorithms focus on identifying a larger query class or using a novel label scheme to reduce I/O operations, and ignore the deficiency of the root-to-leaf strategy. In this paper, we propose a novel twig join algorithm called Track, which adopts the opposite leaf-to-root strategy to process queries. It brings us two benefits: (i) avoiding too much time checking the element index to make sure all branches are satisfied before a new element comes, (ii)using the tree structure to encode final tree matches so as to avoid the merging process. Further experiments on diverse data sets show that our algorithm is indeed superior to current algorithms in terms of query processing performance.

[1]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[2]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[3]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[5]  Tok Wang Ling,et al.  Efficient Processing of Ordered XML Twig Pattern , 2005, DEXA.

[6]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[7]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[8]  Hongjun Lu,et al.  Efficient Processing of Twig Queries with OR-Predicates. , 2004, ACM SIGMOD Conference.

[9]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[11]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[12]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[13]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[14]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[15]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.