BBTC: A New Update-aware Coding Scheme for Efficient Structure Join

Summary The identification of ancestor-descendant or parent-child relationship between elements of XML documents plays a crucial role in efficient XML query processing. One of the popular methods for performing this task is to code each node in the XML document by traversing its nodes. However, the main problems of existing approaches are that they either lack the ability to support XML document update or need huge storage space. This paper proposes a novel coding scheme called Blocked Binary-Tree Coding scheme (BBTC) by taking the issues of identification, easy update and low storage cost into account. BBTC identifies the ancestor-descendant relationship in constant time. For the update, only a few simple operations for the affected document elements are needed. More importantly, for BBTC, this paper proposes a structure join algorithm BDC based on Bucket Divide and Conquer. BDC not only accelerates structure join dramatically when the input element sets are neither sorted nor indexed, but also can be applied to other coding schemes. The extensive experiments show that both the coding scheme BBTC and BDC significantly outperform the existing studies.

[1]  Edith Cohen,et al.  Labeling dynamic XML trees , 2002, PODS '02.

[2]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[3]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[4]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[5]  Paul F. Dietz Maintaining order in a linked list , 1982, STOC '82.

[6]  Donald D. Chamberlin,et al.  XQuery: a query language for XML , 2003, SIGMOD '03.

[7]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[8]  Ahmad Ashari,et al.  Storing And Querying XML Data Using RDBMS , 2004, iiWAS.

[9]  Carlo Zaniolo,et al.  Efficient Complex Query Support for Multiversion XML Documents , 2002, EDBT.

[10]  Masatoshi Yoshikawa,et al.  An XML indexing structure with relative region coordinate , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[12]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Hongjun Lu,et al.  PBiTree coding and efficient processing of containment joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[15]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[16]  Sriram Padmanabhan,et al.  L-Tree: A Dynamic Labeling Structure for Ordered XML Data , 2004, EDBT Workshops.

[17]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  X. Wu,et al.  A prime number labeling scheme for dynamic ordered XML trees , 2004, Proceedings. 20th International Conference on Data Engineering.

[19]  Hao He,et al.  BOXes: efficient maintenance of order-based labeling for dynamic XML data , 2005, 21st International Conference on Data Engineering (ICDE'05).