Efficient processing of XML containment queries using partition-based schemes

XML query languages provide facilities to query XML data both on their value as well as their structure. A basic operation in processing and optimizing such queries is the containment join, which takes two sets of elements and returns pairs of elements where one is the ancestor (or descendant) of the other. Most of the techniques proposed so far assume that the two sets are already sorted or utilize preexisting indexing schemes. In contrast, a partition-based technique does not require indexing or sorting. Instead, the containment join is processed by dividing the input sets into smaller partitions. In this paper, we present a new partition-based scheme that gracefully adapts to different document sizes. The advantages of our approach are validated through an experimental comparison with previous work. Moreover, the experiments demonstrate that our solution provides a viable alternative to non-partition join algorithms when the input data is neither sorted nor indexed.

[1]  Kyoungro Yoon,et al.  Index structures for structured documents , 1996, DL '96.

[2]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[3]  David J. DeWitt,et al.  Mixed Mode XML Query Processing , 2003, VLDB.

[4]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[5]  Yanlei Diao,et al.  YFilter: efficient and scalable filtering of XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[6]  Christoph Koch,et al.  Efficient Processing of Expressive Node-Selecting Queries on XML Data in Secondary Storage: A Tree Automata-based Approach , 2003, VLDB.

[7]  Matthias Brosemann,et al.  XML Path Language (XPath) 1.0 — Seminararbeit — , 2004 .

[8]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[9]  Hongjun Lu,et al.  PBiTree coding and efficient processing of containment joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Quanzhong Li,et al.  Partition Based Path Join Algorithms for XML Data , 2003, DEXA.

[11]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[12]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[13]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[14]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[15]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[16]  Proceedings International Database Engineering and Applications Symposium , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[17]  Christian S. Jensen,et al.  Efficient evaluation of the valid-time natural join , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[18]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[19]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[21]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[22]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .