XML design for relational storage

Design principles for XML schemas that eliminate redundancies and avoid update anomalies have been studied recently. Several normal forms, generalizing those for relational databases, have been proposed. All of them, however, are based on the assumption of anative XML storage, while in practice most of XML data is stored inrelational databases. In this paper we study XML design and normalization for relational storage of XML documents. To be able to relate and compare XML and relational designs, we use an information-theoretic framework that measures information content in relations and documents, with higher values corresponding to lower levels of redundancy. We show that most common relational storage schemes preserve the notion of being well-designed (i.e., anomalies- and redundancy-free). Thus,existing XML normal forms guarantee well-designed relational storagesas well. We further show that if this perfect option is not achievable, then a slight restriction on XML constraints guarantees a "second-best" relational design, according to possible values of the information-theoretic measure. We finally consider an edge-based relational representation of XML documents, and show that while it has similar information-theoretic properties with other relational representations, it can behave significantly worse in terms of enforcing integrity constraints.

[1]  Guido Moerkotte,et al.  A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix , 2006, VLDB.

[2]  Michael Kifer,et al.  Database Systems: An Application Oriented Approach, Complete Version (2nd Edition) , 2005 .

[3]  E. F. Codd,et al.  Proceedings of the 1971 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control , 1970 .

[4]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[5]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[6]  Tony T. Lee,et al.  An Infornation-Theoretic Analysis of Relational Databases—Part I: Data Dependencies and Information Metric , 1987, IEEE Transactions on Software Engineering.

[7]  E. F. Codd,et al.  Further Normalization of the Data Base Relational Model , 1971, Research Report / RJ / IBM / San Jose, California.

[8]  Chengfei Liu,et al.  A Redundancy Free 4NF for XML , 2003, Xsym.

[9]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[10]  Jixue Liu,et al.  Functional Dependencies for XML , 2003, APWeb.

[11]  Vikas Arora,et al.  Native Xquery processing in oracle XMLDB , 2005, SIGMOD '05.

[12]  Ahmad Ashari,et al.  Storing And Querying XML Data Using RDBMS , 2004, iiWAS.

[13]  Denilson Barbosa,et al.  Designing Information-Preserving Mapping Schemes for XML , 2005, VLDB.

[14]  Carlo Zaniolo A new normal form for the design of relational database schemata , 1982, TODS.

[15]  Mark Levene,et al.  Why is the snowflake schema a good data warehouse design? , 2003, Inf. Syst..

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Mark Levene,et al.  A guided tour of relational databases and beyond , 1999 .

[18]  Jeffrey F. Naughton,et al.  XML-SQL Query Translation Literature: The State of the Art and Open Problems , 2003, Xsym.

[19]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[20]  Solmaz Kolahi,et al.  On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF , 2006, PODS '06.

[21]  Marcelo Arenas,et al.  An information-theoretic approach to normal forms for relational and XML data , 2003, PODS.

[22]  Paris C. Kanellakis,et al.  Elements of Relational Database Theory , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[23]  Alin Deutsch,et al.  MARS: A System for Publishing XML from Mixed and Redundant Storage , 2003, VLDB.

[24]  Catriel Beeri,et al.  A Sophisticate's Introduction to Database Normalization Theory , 1978, VLDB.

[25]  E. F. Codd,et al.  Normalized data base structure: a brief tutorial , 1971, SIGFIDET '71.

[26]  Marcelo Arenas,et al.  A normal form for XML documents , 2004, TODS.

[27]  David W. Embley,et al.  Developing XML Documents with Guaranteed "Good" Properties , 2001, ER.

[28]  Mehmet M. Dalkilic,et al.  Information dependencies , 2000, PODS '00.

[29]  Joachim Biskup,et al.  Achievements of Relational Database Schema Design Theory Revisited , 1995, Semantics in Databases.

[30]  Wenfei Fan,et al.  On verifying consistency of XML specifications , 2002, PODS.

[31]  Menzo Windhouwer,et al.  Efficient Relational Storage and Retrieval of XML Documents , 2000, WebDB.

[32]  Chengfei Liu,et al.  Strong functional dependencies and their application to normal forms in XML , 2004, TODS.

[33]  Wenfei Fan,et al.  Propagating XML constraints to relations , 2007, J. Comput. Syst. Sci..

[34]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[35]  Philip A. Bernstein,et al.  Synthesizing third normal form relations from functional dependencies , 1976, TODS.

[36]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[37]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[38]  E. F. Codd,et al.  Recent Investigations in Relational Data Base Systems , 1974, ACM Pacific.

[39]  Junhu Wang,et al.  Removing XML Data Redundancies Using Functional and Equality-Generating Dependencies , 2005, ADC.

[40]  Michael Kifer,et al.  Database Systems : An Application-Oriented Approach , 2005 .