Theory of non-first normal form relational databases (null values, nesting)

The advent of sophisticated software tools running on low-cost, powerful computers has prodded the database community into moving beyond the traditional data processing applications for which database systems were originally designed. Office forms, computer-aided design, and statistical database systems are but a few of the new applications for database systems which require new approaches to the database design and implementation. The foremost model for database use in the last decade has been the relational model. One of the primary assumptions used in the relational model is that all relations must be in first normal form; that is, all values must be non-decomposable units. This assumption unduly constrains our ability to model data, especially for the non-traditional applications which are taxing our current database systems. This research extends relational database theory by relaxing the assumption that all relations in the database must be in first normal form. Relations containing attributes which may be atomic-valued or relation-valued are said to be in non-first normal form (non-1NF). In this context, we develop a non-1NF model and an extended formal query language based on the relational calculus, and prove its equivalence to a relational algebra extended with nest and unnest operators to deal with non-1NF relations. We define a property which non-1NF relations should satisfy, called partitioned normal form (PNF), and develop a set of extended algebra operators to manipulate non-1NF relations and maintain the PNF property. Our model and the extended operators are then further extended to deal with null values and empty nested relations. We present a user-oriented non-1NF query language, called SQL/NF, which is based on the SQL commercial database language and a proposed relational database language standard. Finally, we present a method for achieving nested normal form, a form which eliminates anomalies due to partial and transitive dependencies in PNF relations, and differs from previous algorithms by building non-1NF relations from an initial fourth normal form decomposition, incorporating embedded multivalued dependencies into the design, and improving upon the use of functional dependencies.