Temporal queries and version management in XML-based document archives

By storing the successive versions of a document in an incremental fashion, XML repositories and data warehouses achieve: (i) the efficient preservation of critical information and (ii) the ability to support historical queries on the evolution of documents and their contents. In this paper, we present efficient techniques for managing multi-version document histories and supporting powerful temporal queries on such documents. Our approach consists of: (i) concisely representing the successive versions of a document as an XML document that implements a temporally-grouped data model and (ii) using XML query languages, such as XQuery, to express complex queries on the content of a particular version, and on the temporal evolution of the document elements and contents. We show that the data definition and manipulation framework of XML and XQuery can effectively support temporal models and historical queries without requiring extensions to the current standards; in fact, this approach is effective at representing and querying the histories of relational database tables, which are difficult to manage using SQL. These conclusions emerge through a number of interesting case studies presented in this paper that include W3C documents, the UCLA course catalog, and the CIA World Factbook.

[1]  Sharma Chakravarthy,et al.  CX-DIFF: a change detection algorithm for XML content and change visualization for WebVigiL , 2005, Data Knowl. Eng..

[2]  Curtis E. Dyreson,et al.  Observing transaction-time semantics with /sub TT/XPath , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[3]  David Beech,et al.  Generalized version control in an object-oriented database , 1988, Proceedings. Fourth International Conference on Data Engineering.

[4]  Carlo Zaniolo,et al.  Universal temporal extensions for database languages , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[5]  James Clifford Formal semantics and pragmatics for natural language querying , 1990, Cambridge tracts in theoretical computer science.

[6]  Gultekin Özsoyoglu,et al.  Temporal and Real-Time Databases: A Survey , 1995, IEEE Trans. Knowl. Data Eng..

[7]  Fabio Vitali,et al.  Versioning hypermedia , 1999, CSUR.

[8]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[9]  J. Rowling X-Diff : A Fast Change Detection Algorithm for XML Documents , 2003 .

[10]  Fusheng Wang,et al.  Preserving and Querying Histories of XML-Published Relational Databases , 2002, ER.

[11]  Won Kim,et al.  Modern Database Systems: The Object Model, Interoperability, and Beyond , 1995, Modern Database Systems.

[12]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[13]  Carlo Zaniolo,et al.  Storing and querying multiversion XML documents using durable node numbers , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[14]  Scott Boag,et al.  XQuery 1.0 : An XML Query Language , 2007 .

[15]  Letizia Tanca,et al.  Temporal aspects of semistructured data , 2001, Proceedings Eighth International Symposium on Temporal Representation and Reasoning. TIME 2001.

[16]  Serge Abiteboul,et al.  The Xyleme project , 2002, Comput. Networks.

[17]  Jennifer Widom,et al.  Temporal Data Warehousing , 2009, Encyclopedia of Database Systems.

[18]  Christos Faloutsos,et al.  Advanced Database Systems , 1997, Lecture Notes in Computer Science.

[19]  Walter F. Tichy,et al.  Rcs — a system for version control , 1985, Softw. Pract. Exp..

[20]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[21]  Albert Croker,et al.  On Temporal Grouping , 1995, Temporal Databases.

[22]  Fusheng Wang,et al.  Using XML to Build Efficient Transaction-Time Temporal Database Systems on Relational Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Alberto O. Mendelzon,et al.  Indexing Temporal XML Documents , 2004, VLDB.

[24]  Carlo Zaniolo,et al.  Version Management of XML Documents , 2000, WebDB.

[25]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[26]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[27]  Fabio Grandi,et al.  The Valid Web: An XML/XSL Infrastructure for Temporal Management of Web Documents , 2000, ADVIS.

[28]  Richard T. Snodgrass,et al.  Temporal Slicing in the Evaluation of XML Queries , 2003, VLDB.

[29]  Elisa Bertino,et al.  A Formal Temporal Object-Oriented Data Model , 1996, EDBT.

[30]  Richard T. Snodgrass,et al.  Temporal Object-Oriented Databases: A Critical Comparison , 1995, Modern Database Systems.

[31]  Panos Kalnis,et al.  Indexing spatio-temporal data warehouses , 2002, Proceedings 18th International Conference on Data Engineering.

[32]  Manolis Gergatsoulis,et al.  Representing Changes in XML Documents using Dimensions , 2003, Xsym.

[33]  Elena Ferrari,et al.  A Formal Temporal Object-Oriented Data Model Elisa , 1996 .

[34]  Amélie Marian,et al.  Change-Centric Management of Versions in an XML Warehouse , 2001, VLDB.

[35]  Fusheng Wang,et al.  Publishing and querying the histories of archived relational databases in XML , 2003, Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003..

[36]  David J. DeWitt,et al.  X-Diff: an effective change detection algorithm for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[37]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[38]  Elisa Bertino,et al.  Evolution specification of multigranular temporal objects , 2002, Proceedings Ninth International Symposium on Temporal Representation and Reasoning.

[39]  Keishi Tajima,et al.  Archiving scientific data , 2004, TODS.

[40]  Marc J. Rochkind,et al.  The source code control system , 1975, IEEE Transactions on Software Engineering.

[41]  Fusheng Wang,et al.  Temporal queries in XML document archives and web warehouses , 2003, 10th International Symposium on Temporal Representation and Reasoning, 2003 and Fourth International Conference on Temporal Logic. Proceedings..

[42]  S. Croucher,et al.  Surveys , 1965, Understanding Communication Research Methods.

[43]  Bernhard Seeger,et al.  On Optimal Multiversion Access Structures , 1993, SSD.

[44]  Stephan Kepser,et al.  A Simple Proof for the Turing-Completeness of XSLT and XQuery , 2004, Extreme Markup Languages®.

[45]  Won Kim,et al.  A Unifying Framework for Version Control in a CAD Environment , 1986, VLDB.

[46]  Toshiyuki Amagasa,et al.  A Data Model for Temporal XML Documents , 2000, DEXA.

[47]  Carlo Zaniolo,et al.  Efficient Management of Multiversion Documents by Object Referencing , 2001, VLDB.

[48]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[49]  Carlo Zaniolo,et al.  Copy-based versus edit-based version management schemes for structured documents , 2001, Proceedings Eleventh International Workshop on Research Issues in Data Engineering. Document Management for Data Intensive Business and Scientific Applications. RIDE 2001.

[50]  David Orchard,et al.  XML Linking Language (XLink) , 2001 .

[51]  Sourav S. Bhowmick,et al.  DiffXML: Change Detection in XML Data , 2004, DASFAA.

[52]  Carlo Zaniolo,et al.  Supporting complex queries on multiversion XML documents , 2006, TOIT.

[53]  Jan Chomicki,et al.  Querying ATSQL databases with temporal logic , 1996, TODS.

[54]  Gail E. Kaiser,et al.  Distributed Authoring and Versioning , 1997 .