Adaptive XML Shredding: Architecture, Implementation, and Challenges

As XML data becomes central to business-critical applications, there is a growing need for efficient and reliable XML storage. Two main approaches have been proposed for storing XML data: native and colonial systems. Native systems (e.g., [19], [20]) are designed from the ground up specifically for XML and XML query languages. Colonial systems (e.g., [5],[7], [19]), on the other hand, attempt to reuse existing commercial database systems (DBMS) by mapping XML into the underlying model used by the DBMS. Colonial systems can thus leverage features, such as concurrency control, crash recovery, scalability, and highly optimized query processors available in the DMBS, making them an attractive alternative for managing XML data. However, several technical challenges need to be addressed in terms of architecture, algorithms, and implementation of these systems.In this paper, we described how these issues are addressed in the context of colonial systems that use relational databases as the underlying DBMS.

[1]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[2]  Michael J. Carey,et al.  XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents , 2000, VLDB.

[3]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[4]  Daniela Florescu,et al.  A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database , 1999 .

[5]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[6]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[7]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[9]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Surajit Chaudhuri,et al.  Materialized view and index selection tool for Microsoft SQL server 2000 , 2001, SIGMOD '01.

[11]  Dan Suciu,et al.  SilkRoute: trading between relations and XML , 2000, Comput. Networks.

[12]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[13]  Juliana Freire,et al.  LegoDB: Customizing Relational Storage for XML Documents , 2002, VLDB.

[14]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[15]  Daniela Florescu,et al.  Quilt: An XML Query Language for Heterogeneous Data Sources , 2000, WebDB.

[16]  Juliana Freire,et al.  StatiX: making XML count , 2002, SIGMOD '02.