Path-expression Queries over Multiversion XML Documents

In this paper we address the problem of evaluating path expression queries on multiversion XML documents. Such queries are typically implemented on static (i.e., non-versioned) documents as path joins, using numbering schemes that maintain the structural relationships among the document elements. We extend previously proposed pattern matching techniques so as to support versions. We first present an easily updatable numbering scheme that efficiently captures structural relationships among the elements of the dynamically evolving document. We then propose a variation of Pathstack, an optimal pattern matching algorithm, that addresses the characteristics of our environment. Finally, through a thorough experimental evaluation we investigate two storage techniques in terms of space utilization and query efficiency.

[1]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[2]  Edith Cohen,et al.  Labeling dynamic XML trees , 2002, SIAM J. Comput..

[3]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[4]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[5]  Guido Moerkotte,et al.  Evaluating Queries on Structure with eXtended Access Support Relations , 2000, WebDB.

[6]  Keishi Tajima,et al.  Archiving scientific data , 2004, TODS.

[7]  Carlo Zaniolo,et al.  Efficient Management of Multiversion Documents by Object Referencing , 2001, VLDB.

[8]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[9]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[10]  Georg Gottlob,et al.  XPath query evaluation: improving time and space efficiency , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[11]  Alon Y. Halevy,et al.  Updating XML , 2001, SIGMOD '01.

[12]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Hongjun Lu,et al.  PBiTree coding and efficient processing of containment joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[15]  Jennifer Widom,et al.  Representing and querying changes in semistructured data , 1998, Proceedings 14th International Conference on Data Engineering.

[16]  Raymond K. Wong,et al.  Managing and querying multi-version XML data with update logging , 2002, DocEng '02.

[17]  Jennifer Widom,et al.  Managing Historical Semistructured Data , 1999, Theory Pract. Object Syst..

[18]  Georg Gottlob,et al.  XPath processing in a nutshell , 2003, SGMD.

[19]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Amélie Marian,et al.  Change-Centric Management of Versions in an XML Warehouse , 2001, VLDB.

[21]  Tova Milo,et al.  Optimizing queries on files , 1994, SIGMOD '94.

[22]  Carlo Zaniolo,et al.  Efficient Complex Query Support for Multiversion XML Documents , 2002, EDBT.

[23]  Christos Faloutsos,et al.  Access Methods for Bi-Temporal Databases , 1995, Temporal Databases.

[24]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.