Database Normalization as a By-product of Minimum Message Length Inference

Database normalization is a central part of database design in which we re-organise the data stored so as to progressively ensure that as few anomalies occur as possible upon insertions, deletions and/or modifications. Successive normalizations of a database to higher normal forms continue to reduce the potential for such anomalies. We show here that database normalization follows as a consequence (or special case, or by-product) of the Minimum Message Length (MML) principle of machine learning and inductive inference. In other words, someone (previously) oblivious to database normalization but well-versed in MML could examine a database and - using MML considerations alone - normalise it, and even discover the notion of attribute inheritance.

[1]  David L. Dowe,et al.  Minimum Message Length and Statistically Consistent Invariant (Objective?) Bayesian Probabilistic Inference—From (Medical) “Evidence” , 2008 .

[2]  William Kent,et al.  ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY , 2000 .

[3]  E. B. Andersen,et al.  Information Science and Statistics , 1986 .

[4]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .

[5]  David L. Dowe,et al.  Minimum message length and generalized Bayesian nets with asymmetric languages , 2005 .

[6]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[7]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[8]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[9]  C. S. Wallace,et al.  MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions , 1997 .

[10]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[11]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[12]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[13]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[14]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[15]  David L. Dowe,et al.  Foreword re C. S. Wallace , 2008, Comput. J..

[16]  David L. Dowe,et al.  MML, hybrid Bayesian network graphical models, statistical consistency, invarianc , 2010 .

[17]  Peter Grünwald,et al.  Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .