Economic Incentives for Database Normalization

Abstract Database systems are central to business information processing. The conceptual basis for most commercial database managers is the relational model. Little research exists concerning the cost effectiveness of relational database normalization, but there is anecdotal evidence that normalization-induced fragmentation may create inefficiencies. Supply and demand for normalization is investigated given management policies for response time, database capacity, and deletion policies. On the supply side, normalization reduces costs associated with insertion, deletion, and change anomalies. The expected cost of removing change anomalies is linearly proportional to both minimum database size and to database capacity. The occurrence rates of either insertion or deletion anomalies are shown to be moderate for all but microcomputer sized databases. But because insertion or deletion anomalies tend to result in significant cost, even small probabilities of occurrence can result in significant costs. On the demand side, normalization can create retrieval inefficiencies where a comparatively small amount of information is being sought and retrieved from the database. Both an increase in clustering, and an increase in database size will exacerbate these inefficiencies. This can result in fragmentation inefficiencies and information overload. It is suggested that normalization reduces the opportunity cost associated with information retrieval from a database by improving recall and is most pronounced when recall is low. Where retrieval rates are high with respect to update events, the database fragmentation caused by normalization costs end users through slower retrieval response.

[1]  William S. Cooper,et al.  A definition of relevance for information retrieval , 1971, Inf. Storage Retr..

[2]  Jane Fedorowicz A Zipfian Model of an Automatic Bibliographic System: An Application to MEDLINE , 1982, J. Am. Soc. Inf. Sci..

[3]  Karen Spärck Jones Search Term Relevance Weighting given Little Relevance Information , 1997, J. Documentation.

[4]  Jane Fedorowicz,et al.  The Theoretical Foundation of Zipf's Law and Its Application to the Bibliographic Database Environment , 2007, J. Am. Soc. Inf. Sci..

[5]  J. Christopher Westland,et al.  Assessing the Economic Benefits of Information Systems Auditing , 1990, Inf. Syst. Res..

[6]  J. Christopher Westland Economic constraints in hypertext , 1991 .

[7]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[8]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[9]  J. Christopher Westland Collaboration and Productivity in Information Systems Research , 1990, Inf. Soc..

[10]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[11]  Anthony C. Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions , 1982, JACM.

[12]  David C. Blair Searching biases in large interactive document retrieval systems , 1980, J. Am. Soc. Inf. Sci..

[13]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[14]  Don R. Swanson,et al.  Information Retrieval as a Trial-And-Error Process , 1977, The Library Quarterly.

[15]  David C. Blair,et al.  Indeterminacy in the subject access to documents , 1986, Inf. Process. Manag..

[16]  Don R. Swanson,et al.  Searching Natural Language Text by Computer , 1960 .

[17]  Belver C. Griffith,et al.  A method for partitioning the journal literature , 1980, J. Am. Soc. Inf. Sci..

[18]  E. F. Codd,et al.  Further Normalization of the Data Base Relational Model , 1971, Research Report / RJ / IBM / San Jose, California.

[19]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[20]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[21]  Gerard Salton,et al.  Experiments in Automatic Thesaurus Construction for Information Retrieval , 1971, IFIP Congress.

[22]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[23]  J. Christopher Westland,et al.  Scaling up output capacity and performance results from information systems prototypes , 1990, TODS.

[24]  Jean Tague-Sutcliffe,et al.  Split size-rank models for the distribution of index terms , 1985, J. Am. Soc. Inf. Sci..

[25]  J. Christopher Westland,et al.  Problem vectorizability and the market for vector supercomputers , 1991, Inf. Process. Manag..

[26]  Bruce M. Hill,et al.  The Rank-Frequency Form of Zipf's Law , 1974 .

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[28]  J. Christopher Westland,et al.  The marginal analysis of strategic investments in information technology , 1993 .

[29]  J. Christopher Westland Topic-Specific Market Concentration in the Information Services Industry: Evidence from the DIALOG Group of Data Bases , 1989, Inf. Soc..

[30]  Umberto Eco,et al.  A theory of semiotics , 1976, Advances in semiotics.

[31]  Gultekin Özsoyoglu,et al.  Statistical database design , 1981, TODS.

[32]  David K. Hsiao ACM transactions on database systems: aim and scope , 1976, TODS.

[33]  J. Christopher Westland Self-organizing executive information networks , 1992, Decis. Support Syst..

[34]  Samuel Karlin,et al.  A First Course on Stochastic Processes , 1968 .

[35]  Y. Zhang,et al.  Enhancement of text representations using related document titles , 1986, Inf. Process. Manag..

[36]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[37]  J. Christopher Westland A net benefits approach to measuring retrieval performance , 1989, Inf. Process. Manag..

[38]  Gerard Salton,et al.  Mathematics and Information Retrieval , 1979, J. Documentation.

[39]  H A Simon,et al.  Some distributions associated with bose-einstein statistics. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[41]  J. Westland Congestion and network externalities in the short run pricing of information system services , 1992 .