Reuse or Never Reuse the Deleted Labels in XML Query Processing Based on Labeling Schemes

To facilitate the XML query processing, several kinds of labeling schemes have been proposed. Based on the labeling schemes, the ancestor-descendant and parent-child relationships in XML queries can be quickly determined without accessing the original XML file. Recently, more researches are focused on how to update the labels when nodes are inserted into the XML. However how to process the deleted labels are not discussed previously. We think that the deleted labels can be processed in two different directions: (1) reuse all the deleted labels to control the label size increasing speed and improve the query performance; (2) never reuse the deleted labels to query different versions of the XML data based on labeling schemes. In this paper, we firstly introduce our previous work, called QED, which can completely avoid the re-labeling in XML updates. Secondly based on QED we propose a new algorithm, called Reuse, which can reuse all the deleted labels to control the label size increasing speed; meanwhile the Reuse algorithm can completely avoid the re-labeling also. Thirdly to query different versions of the XML data, we propose another new algorithm, called NeverReuse, which is the only approach that never reuses any deleted labels. Extensive experimental results show that the algorithms proposed in this paper can control the label size increasing speed when reusing all the deleted labels, and is the only approach to query different versions of the XML data based on labeling schemes.

[1]  Tok Wang Ling,et al.  QED: a novel quaternary encoding to completely avoid re-labeling in XML updates , 2005, CIKM '05.

[2]  Carlo Zaniolo,et al.  Efficient Management of Multiversion Documents by Object Referencing , 2001, VLDB.

[3]  Fusheng Wang,et al.  An XML-Based Approach to Publishing and Querying the History of Databases , 2005, World Wide Web.

[4]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[5]  Hao He,et al.  BOXes: efficient maintenance of order-based labeling for dynamic XML data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[7]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[8]  Mong-Li Lee,et al.  A Prime Number Labeling Scheme for Dynamic Ordered XML Trees , 2004, ICDE.

[9]  Edith Cohen,et al.  Labeling dynamic XML trees , 2002, PODS '02.

[10]  Leonidas Fegaras,et al.  Data stream management for historical XML data , 2004, SIGMOD '04.

[11]  Tok Wang Ling,et al.  Efficient Processing of Updates in Dynamic XML Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Amélie Marian,et al.  Change-Centric Management of Versions in an XML Warehouse , 2001, VLDB.

[13]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[14]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[15]  Carlo Zaniolo,et al.  Supporting complex queries on multiversion XML documents , 2006, TOIT.

[16]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[17]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[18]  Toshiyuki Amagasa,et al.  QRS: a robust numbering scheme for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[20]  Hyun Jin Moon,et al.  Managing Multiversion Documents & Historical Databases: a Unified Solution Based on XML , 2005, WebDB.