Querying XML documents by dynamic shredding

With the wide adoption of XML as a standard data representation and exchange format querying XML documents becomes increasingly important. However relational database systems constitute a much more mature technology than what is available for native storage of XML. To bridge the gap one way to manage XML data is to use a commercial relational database system. In this approach users typically first ``shred'' their documents by isolating what they predict to be meaningful fragments then store the individual fragments according to some relational schema and later translate each XML query (e.g. expressed in W3C's XQuery) to SQL queries expressed against the shredded documents. In this paper we propose an alternative approach that builds on relational database technology but shreds XML documents dynamically. This avoids many of the problems in maintaining document order and reassembling compound data from its fragments. We then present an algorithm to translate a significant subset of XQuery into an extended relational algebra that includes operators defined for the structured text datatype. This algorithm can be used as the basis of a sound translation from XQuery to SQL and the starting point for query optimization which is required for XML to be supported by relational database technology.

[1]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Frank Wm. Tompa,et al.  Exploiting functional dependence in query optimization , 2000 .

[3]  Frank Wm. Tompa,et al.  XQuery rewriting at the relational algebra level , 2003, Comput. Syst. Sci. Eng..

[4]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[5]  Xiaoling Wang,et al.  An Adaptable and Adjustable Mapping from XML Data to Tables in RDB , 2002, EEXTT.

[6]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[7]  Elke A. Rundensteiner,et al.  Efficiently supporting order in XML query processing , 2003, WIDM '03.

[8]  Dan Suciu,et al.  SilkRoute: trading between relations and XML , 2000, Comput. Networks.

[9]  Michael J. Carey,et al.  XPERANTO: Publishing Object-Relational Data as XML , 2000, WebDB.

[10]  Ning Zhang,et al.  XML Query Processing and Optimization , 2004, EDBT Workshops.

[11]  Frank Wm. Tompa,et al.  A Structured Text ADT for Object-Relational Databases , 1998, Theory Pract. Object Syst..

[12]  Umeshwar Dayal,et al.  Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers , 1987, VLDB.

[13]  Gaston H. Gonnet,et al.  Mind Your Grammar: a New Approach to Modelling Text , 1987, VLDB.

[14]  Stéphane Bressan,et al.  Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web , 2003, Lecture Notes in Computer Science.

[15]  Eugene J. Shekita,et al.  Querying XML Views of Relational Data , 2001, VLDB.

[16]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[17]  Guido Moerkotte,et al.  Efficient Storage of XML Data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  M. Tamer Özsu,et al.  A comprehensive XQuery to SQL translation using dynamic interval encoding , 2003, SIGMOD '03.

[19]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[20]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[21]  Catriel Beeri,et al.  SAL: An Algebra for Semistructured Data and XML , 1999, WebDB.

[22]  Philip Wadler,et al.  An Algebra for XML Query , 2000, FSTTCS.

[23]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[24]  Ioana Manolescu,et al.  Answering XML Queries on Heterogeneous Data Sources , 2001, VLDB.

[25]  Vassilis Christophides,et al.  On wrapping query languages and efficient XML integration , 2000, SIGMOD '00.

[26]  Guido Moerkotte,et al.  Algebraic XML Construction and its Optimization in Natix , 2002, World Wide Web.