XEM: XML Evolution Management

As information on the World Wide Web continues to proliferate at an astounding rate, the Extensible Markup Language (XML) has been emerging as a standard format for data representation on the web. In many application domains, specific document type definitions (DTDs) are designed to enforce a semantically agreed-upon structure of the XML documents. In XML context, these structural definitions serve as schemata. However, both the data and the structure (schema) of XML documents tend to change over time for a multitude of reasons, including to correct design errors in the DTD, to allow expansion of the application scope over time, or to account for the merging of several businesses into one. Most of the current software tools that enable the use of XML do not provide explicit support for such data or schema changes. Using these tools in a changing environment entails making manual edits to DTDs and XML data and reloading them from scratch. In this vein, we put forth the first solution framework, called XML Evolution Manager (XEM), to manage the evolution of DTDs and XML documents. XEM provides a minimal yet complete taxonomy of basic change primitives. These primitives, classified as either data or schema changes, are consistency-preserving. For a data change, they ensure that the modified XML document conforms to its DTD both in structure and constraints. For a schema change, they ensure that the new DTD is well-formed, and all existing XML documents are transformed also to conform to the modified DTD. We prove both the completeness of our evolution taxonomy, as well as its consistency-preserving nature. To verify the feasibility of our XEM approach we have implemented a working prototype system in Java, using the XML4J parser from IBM and PSE Pro as our backend storage system. We present an experimental study run on this system where we compare the relative efficiencies of the primitive operations in terms of their execution times. We then contrast these execution times against the time to reload the data, which would be required in a manual system. Based on the results of these experiments we conclude that our approach improves upon the previous method of making manual changes and reloading data from scratch by providing automated evolution management facilities for DTDs and XML documents.

[1]  Jennifer Widom,et al.  Representing and querying changes in semistructured data , 1998, Proceedings 14th International Conference on Data Engineering.

[2]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[3]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[4]  Stanley B. Zdonik,et al.  The management of changing types in an object-oriented database , 1986, OOPLSA '86.

[5]  Dan Suciu,et al.  STRUDEL: a Web site management system , 1997, SIGMOD '97.

[6]  Elke A. Rundensteiner,et al.  XEM: managing the evolution of XML documents , 2001, Proceedings Eleventh International Workshop on Research Issues in Data Engineering. Document Management for Data Intensive Business and Scientific Applications. RIDE 2001.

[7]  David Maier,et al.  The GemStone Data Management System , 1989, Object-Oriented Concepts, Databases, and Applications.

[8]  Michel Léonard,et al.  Management Of Schema Evolution In Databases , 1991, VLDB.

[9]  Jay Banerjee,et al.  Semantics and implementation of schema evolution in object-oriented databases , 1987, SIGMOD '87.

[10]  David Schach,et al.  XML Query Language (XQL) , 1998, QL.

[11]  Paolo Merialdo,et al.  Araneus in the Era of XML , 1999, IEEE Data Eng. Bull..

[12]  Sheng Liang,et al.  Dynamic class loading in the Java virtual machine , 1998, OOPSLA '98.

[13]  Elke A. Rundensteiner,et al.  Clock: synchronizing internal relational storage with external XML documents , 2001, Proceedings Eleventh International Workshop on Research Issues in Data Engineering. Document Management for Data Intensive Business and Scientific Applications. RIDE 2001.

[14]  Dongwon Lee,et al.  Constraints-Preserving Transformation from XML Document Type Definition to Relational Schema , 2000, ER.

[15]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[16]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[17]  Elke A. Rundensteiner,et al.  SERF: schema evolution through an extensible, re-usable and flexible framework , 1998, CIKM '98.

[18]  Elke A. Rundensteiner,et al.  OQL_SERF: an ODMG implementation of the template-based schema evolution framework , 1998, CASCON.

[19]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[20]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[21]  Tim Berners-Lee,et al.  Information Management: A Proposal , 1990 .

[22]  Wenfei Fan,et al.  Integrity constraints for XML , 2000, PODS.

[23]  D. Sjøberg,et al.  Quantifying schema evolution , 1993, Inf. Softw. Technol..

[24]  Barbara Lerner,et al.  A model for compound type changes encountered in schema evolution , 2000, TODS.

[25]  Sudarshan S. Chawathe,et al.  Describing and Manipulating XML Data , 1999, IEEE Data Eng. Bull..