An automatic load/extract scheme for XML documents through object-relational repositories

Extensible markup language (XML), a simplified version of standard generalized markup language (SGML), is designed to enable electronic text interchange in the Internet. XML documents have a rigorously described structure that may be analyzed by computers and easily understood by humans. Most current approaches store XML documents in file systems or in relational database systems. However, the nature and the design of file system or relational database schema may cause limitations on fitting with XML document structure. In this paper, we present an automatic load/extract scheme to store and retrieve XML documents through object-relational databases. We propose an architecture, called XML meta-generator (XMG), which, after reading a specific document type definition (DTD), automatically generates the corresponding object-relational database schema (OR-Schema), a DI-Decomposer and a DI-Reconstructor, which are explained as follows: 1. OR-Schema--an object-relational database schema in UniSQL/X format for a specific DTD. 2. DI-Decomposer--a module decomposes XML document instances (Dis) according to the specific DTD format and stores the elements into the corresponding object-relational database. 3. DI-Reconstructor--a module retrieves elements from the object-relational database and reconstructs it to recover the original DI.These modules make XML documents be automatically decomposed into and reconstructed from object-relational databases in a seamless manner. Moreover, documents stored in the object-relational databases can be managed and inquired more easily than it could be in file systems or relational databases. Useful applications on various documents can also be easily built on top of the target database, such as digital libraries, data warehouses, and data or text mining systems.

[1]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[2]  Andrew V. Royappa Implementing catalog clearinghouses with XML and XSL , 1999, SAC '99.

[3]  Tim Bray,et al.  Presenting Xml , 1997 .

[4]  Ian A. Macleod,et al.  Storage and retrieval of structured documents , 1990, Inf. Process. Manag..

[5]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[6]  Meike Klettke,et al.  Managing XML documents in object-relational databases , 1999 .

[7]  A. Retrospective,et al.  The UNIX Time-sharing System , 1977 .

[8]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[9]  Jian Zhang Application of OODB and SGML techniques in text database: an electronic dictionary system , 1995, SGMD.

[10]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[11]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[12]  Fabio Vitali,et al.  Managing Complex Documents Over the WWW: A Case Study for XML , 1999, IEEE Trans. Knowl. Data Eng..

[13]  Eric van der Vlist,et al.  XML Schema , 2002 .

[14]  Christof Bornhövd,et al.  A generic load/extract utility for data transfer between XML documents and relational databases , 2000, Proceedings Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2000.

[15]  Zhao Jun XML and Database , 2003 .

[16]  Leslie Lamport,et al.  Latex : A Document Preparation System , 1985 .

[17]  Holger Meyer,et al.  XML and Object-Relational Database Systems - Enhancing Structural Mappings Based on Statistics , 2000, WebDB.

[18]  Abraham Silberschatz,et al.  Extended algebra and calculus for nested relational databases , 1988, TODS.

[19]  Christine Vanoirbeek,et al.  XML documents production for an electronic platform of requests for proposals , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[20]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[21]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[22]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[23]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[24]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[25]  Ken Thompson,et al.  The UNIX time-sharing system , 1974, CACM.

[26]  Janne Saarela,et al.  Multipurpose Web publishing using HTML, XML, and CSS , 1999, CACM.

[27]  Elisa Bertino,et al.  Integrating XML and databases , 2001, IEEE Internet Computing.

[28]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[29]  Tim Berners-Lee,et al.  Hypertext Markup Language (HTML): A Representation of Textual Information and MetaInformation for Retrieval and Interchange , 2003 .

[30]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[31]  Hiroshi Ishikawa,et al.  The design of a query language for XML data , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[32]  Hamid Pirahesh,et al.  Efficiently publishing relational data as XML documents , 2001, The VLDB Journal.

[33]  Eric van Herwijnen,et al.  Practical SGML , 1994, Springer US.