A normal form for XML documents

This paper takes a first step towards the design and normalization theory for XML documents. We show that, like relational databases, XML documents may contain redundant information, and may be prone to update anomalies. Furthermore, such problems are caused by certain functional dependencies among paths in the document. Our goal is to find a way of converting an arbitrary DTD into a well-designed one, that avoids these problems. We first introduce the concept of a functional dependency for XML, and define its semantics via a relational representation of XML. We then define an XML normal form, XNF, that avoids update anomalies and redundancies. We study its properties and show that it generalizes BCNF and a normal form for nested relations when those are appropriately coded as XML documents. Finally, we present a lossless algorithm for converting any DTD into one in XNF.

[1]  Tok Wang Ling,et al.  Designing Functional Dependencies for XML , 2002, EDBT.

[2]  Z. Meral Özsoyoglu,et al.  A new normal form for nested relations , 1987, TODS.

[3]  Wenfei Fan,et al.  Reasoning about Keys for XML , 2001, DBPL.

[4]  Carl A. Gunter Semantics of programming languages: structures and techniques , 1993, Choice Reviews Online.

[5]  Paris C. Kanellakis,et al.  Elements of Relational Database Theory , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[6]  Richard Hull Relative Information Capacity of Simple Relational Database Schemata , 1986, SIAM J. Comput..

[7]  Marcelo Arenas,et al.  An information-theoretic approach to normal forms for relational and XML data , 2003, PODS.

[8]  Guido Moerkotte,et al.  Efficient Storage of XML Data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[9]  David W. Embley,et al.  A normal form for precisely characterizing redundancy in nested relations , 1996, TODS.

[10]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[11]  Mark Levene,et al.  Axiomatisation of Functional Dependencies in Incomplete Relations , 1998, Theor. Comput. Sci..

[12]  Peter Buneman,et al.  Using Powerdomains to Generalize Relational Databases , 1991, Theor. Comput. Sci..

[13]  Jan Van den Bussche,et al.  Simulation of the nested relational algebra by the flat relational algebra, with an application to the complexity of evaluating powerset algebra expressions , 2001, Theor. Comput. Sci..

[14]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[15]  Wenfei Fan,et al.  On XML integrity constraints in the presence of DTDs , 2001, PODS '01.

[16]  Catriel Beeri,et al.  A Sophisticate's Introduction to Database Normalization Theory , 1978, VLDB.

[17]  Robert D. Tennent,et al.  Semantics of programming languages , 1991, Prentice Hall International Series in Computer Science.

[18]  Serge Abiteboul,et al.  Representing and querying XML with incomplete information , 2001, PODS '01.

[19]  Alon Y. Halevy,et al.  Updating XML , 2001, SIGMOD '01.

[20]  Philip Wadler,et al.  A Semi-monad for Semi-structured Data , 2001, ICDT.

[21]  Zahir Tari,et al.  Object normal forms and dependency constraints for object-oriented schemata , 1997, TODS.

[22]  David W. Embley,et al.  Developing XML Documents with Guaranteed "Good" Properties , 2001, ER.

[23]  Ronald Fagin,et al.  An Equivalence Between Relational Database Dependencies and a Fragment of Propositional Logic , 1981, JACM.

[24]  Gösta Grahne,et al.  The Problem of Incomplete Information in Relational Databases , 1991, Lecture Notes in Computer Science.

[25]  Dan Suciu,et al.  Bounded Fixpoints for Complex Objects , 1993, Theor. Comput. Sci..

[26]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[27]  Wenfei Fan,et al.  Integrity constraints for XML , 2003, J. Comput. Syst. Sci..

[28]  Wenfei Fan,et al.  Keys for XML , 2002, Comput. Networks.

[29]  Jean H. Gallier,et al.  Linear-Time Algorithms for Testing the Satisfiability of Propositional Horn Formulae , 1984, J. Log. Program..

[30]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[31]  Paolo Atzeni,et al.  Functional Dependencies in Relations with Null Values , 1984, Inf. Process. Lett..

[32]  Derick Wood,et al.  Normal form algorithms for extended context-free grammars , 2001, Theor. Comput. Sci..