Efficient updates in dynamic XML data: from binary string to quaternary string

XML query processing based on labeling schemes has been thoroughly studied in the past several years. Recently efficient processing of updates in dynamic XML data has gained more attention. However, all the existing techniques have high update cost, they cannot completely avoid re-labeling in XML updates, and they will increase the label size which will influence the query performance. Thus, in this paper we propose a novel Compact Dynamic Binary String (CDBS) encoding to efficiently process updates. CDBS has two important properties which form the foundations of this paper: (1) CDBS supports that CDBS codes can be inserted between any two consecutive CDBS codes with orders kept and without re-encoding the existing codes; (2) CDBS is orthogonal to specific labeling schemes; thus it can be applied broadly to different labeling schemes or other applications to efficiently process updates. Moreover, because CDBS will encounter the overflow problem, we improve CDBS to Compact Dynamic Quaternary String (CDQS) encoding which can completely avoid re-labeling in XML leaf node updates no matter what the labeling schemes are. Meanwhile, we also discuss how to efficiently process internal node updates. We report the experimental results to show that our CDBS and CDQS are superior to previous approaches to process both leaf node and internal node updates.

[1]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[2]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[3]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[4]  Alon Y. Halevy,et al.  Updating XML , 2001, SIGMOD '01.

[5]  David J. DeWitt,et al.  Mixed Mode XML Query Processing , 2003, VLDB.

[6]  Serge Abiteboul,et al.  Regular path queries with constraints , 1997, PODS '97.

[7]  Nicola Santoro,et al.  Labelling and Implicit Routing in Networks , 1985, Computer/law journal.

[8]  Tok Wang Ling,et al.  Efficient processing of XML twig patterns with parent child edges: a look-ahead approach , 2004, CIKM '04.

[9]  Francois Yergeau UTF-8, a transformation format of ISO 10646 , 1998, RFC.

[10]  Tok Wang Ling,et al.  Efficient Processing of Updates in Dynamic XML Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[12]  Haim Kaplan,et al.  Compact labeling schemes for ancestor queries , 2001, SODA '01.

[13]  Toshiyuki Amagasa,et al.  QRS: a robust numbering scheme for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  X. Wu,et al.  A prime number labeling scheme for dynamic ordered XML trees , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Hao He,et al.  BOXes: efficient maintenance of order-based labeling for dynamic XML data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  Masatoshi Yoshikawa,et al.  A Structural Numbering Scheme for XML Data , 2002, EDBT Workshops.

[17]  Hao He,et al.  Incremental maintenance of XML structural indexes , 2004, SIGMOD '04.

[18]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[19]  Beng Chin Ooi,et al.  Lazy XML updates: laziness as a virtue, of update and structural join efficiency , 2005, SIGMOD '05.

[20]  Dan Suciu,et al.  Optimizing regular path expressions using graph schemas , 1998, Proceedings 14th International Conference on Data Engineering.

[21]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[22]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[23]  Tok Wang Ling,et al.  QED: a novel quaternary encoding to completely avoid re-labeling in XML updates , 2005, CIKM '05.

[24]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[25]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[26]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[27]  Masatoshi Yoshikawa,et al.  An XML indexing structure with relative region coordinate , 2001, Proceedings 17th International Conference on Data Engineering.

[28]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[29]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[30]  Paul F. Dietz Maintaining order in a linked list , 1982, STOC '82.

[31]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[32]  M. Tamer Özsu,et al.  A succinct physical storage scheme for efficient evaluation of path queries in XML , 2004, Proceedings. 20th International Conference on Data Engineering.

[33]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[34]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[35]  Georg Gottlob,et al.  XPath query evaluation: improving time and space efficiency , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[36]  W. Li,et al.  Number theory with applications , 1996 .

[37]  Tok Wang Ling,et al.  PathStack : A Holistic Path Join Algorithm for Path Query with Not-Predicates on XML Data , 2005, DASFAA.

[38]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[39]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[40]  Edith Cohen,et al.  Labeling dynamic XML trees , 2002, PODS '02.

[41]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[42]  Guangming Xing,et al.  Extendible range-based numbering scheme for XML document , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..