Efficient schemes for managing multiversionXML documents

Abstract. Multiversion support for XML documents is needed in many critical applications, such as software configuration control, cooperative authoring, web information warehouses, and ”e-permanence” of web documents. In this paper, we introduce efficient and robust techniques for: (i) storing and retrieving; (ii) viewing and exchanging; and (iii) querying multiversion XML documents. We first discuss the limitations of traditional version control methods, such as RCS and SCCS, and then propose novel techniques that overcome their limitations. Initially, we focus on the problem of managing secondary storage efficiently, and introduce an edit-based versioning scheme that enhances RCS with an effective clustering policy based on the concept of page-usefulness. The new scheme drastically improves version retrieval at the expense of a small (linear) space overhead. However, the edit-based approach falls short of achieving objectives (ii) and (iii). Therefore, we introduce and investigate a second scheme, which is reference-based and preserves the structure of the original document. In the reference-based approach, a multiversion document can be represented as yet another XML document, which can be easily exchanged and viewed on the web; furthermore, simple queries are also expressed and supported well under this representation. To achieve objective (i), we extend the page-usefulness clustering technique to the reference-based scheme. After characterizing the asymptotic behavior of the new techniques proposed, the paper presents the results of an experimental study evaluating and comparing their performance.

[1]  David B. Leblang The CM challenge: configuration management that works , 1995 .

[2]  David J. DeWitt,et al.  Object and File Management in the EXODUS Extensible Database System , 1986, VLDB.

[3]  Rakesh M. Verma,et al.  An Efficient Multiversion Access STructure , 1997, IEEE Trans. Knowl. Data Eng..

[4]  Jennifer Widom,et al.  Managing Historical Semistructured Data , 1999, Theory Pract. Object Syst..

[5]  Amélie Marian,et al.  Change-Centric Management of Versions in an XML Warehouse , 2001, VLDB.

[6]  Walter F. Tichy,et al.  Rcs — a system for version control , 1985, Softw. Pract. Exp..

[7]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[8]  David B. Lomet,et al.  Access methods for multiversion data , 1989, SIGMOD '89.

[9]  Christos Faloutsos,et al.  Advanced Database Systems , 1997, Lecture Notes in Computer Science.

[10]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[11]  Randy H. Katz,et al.  Managing Change in a Computer-Aided Design Database , 1987, Research Foundations in Object-Oriented and Semantic Database Systems.

[12]  Carlo Zaniolo,et al.  Version Management of XML Documents , 2000, WebDB.

[13]  Christos Faloutsos,et al.  Designing Access Methods for Bitemporal Databases , 1998, IEEE Trans. Knowl. Data Eng..

[14]  Alon Y. Halevy,et al.  Updating XML , 2001, SIGMOD '01.

[15]  Malcolm C. Easton,et al.  Key-Sequence Data Sets on Inedible Storage , 1986, IBM J. Res. Dev..

[16]  Ahmad Ashari,et al.  Storing And Querying XML Data Using RDBMS , 2004, iiWAS.

[17]  Marc J. Rochkind,et al.  The source code control system , 1975, IEEE Transactions on Software Engineering.

[18]  Gultekin Özsoyoglu,et al.  Temporal and Real-Time Databases: A Survey , 1995, IEEE Trans. Knowl. Data Eng..

[19]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[20]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[21]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[22]  Kyoungro Yoon,et al.  Version Management in Structured Document Retrieval Systems , 1996, SEKE.

[23]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[24]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[25]  Carlo Zaniolo,et al.  Efficient Complex Query Support for Multiversion XML Documents , 2002, EDBT.

[26]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[27]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[28]  Kaizhong Zhang,et al.  Algorithms for the constrained editing distance between ordered labeled trees and related problems , 1995, Pattern Recognit..

[29]  Vassilis J. Tsotras,et al.  The Snapshot Index: An I/O-optimal access method for timeslice queries , 1995, Inf. Syst..

[30]  Won Kim,et al.  A Unifying Framework for Version Control in a CAD Environment , 1986, VLDB.

[31]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[32]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[33]  Keishi Tajima,et al.  Archiving scientific data , 2004, TODS.

[34]  Christos Faloutsos,et al.  Access Methods for Bi-Temporal Databases , 1995, Temporal Databases.

[35]  Carlo Zaniolo,et al.  Efficient Management of Multiversion Documents by Object Referencing , 2001, VLDB.

[36]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[37]  Jennifer Widom,et al.  Representing and querying changes in semistructured data , 1998, Proceedings 14th International Conference on Data Engineering.

[38]  Clesio Saraiva dos Santos,et al.  Extending a structured document model with version control , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[39]  David Beech,et al.  Generalized version control in an object-oriented database , 1988, Proceedings. Fourth International Conference on Data Engineering.

[40]  George Kollios,et al.  Hashing Methods for Temporal Data , 2002, IEEE Trans. Knowl. Data Eng..

[41]  F. Warren Burton,et al.  Implementation of Overlapping B-Trees for Time and Space Efficient Representation of Collections of Similar Files , 1990, Comput. J..