Enhancing database correctness: a statistical approach

In this paper, we introduce a new type of integrity constraint, which we call a statistical constraint, and discuss its applicability to enhancing database correctness. Statistical constraints manifest embedded relationships among current attribute values in the database and are characterized by their probabilistic nature. They can be used to detect potential errors not easily detected by the conventional constraints. Methods for extracting statistical constraints from a relation and enforcement of such constraints are described. Preliminary performance evaluation of enforcing statistical constraints on a real life database is also presented.

[1]  Narain H. Gehani,et al.  ODE (Object Database and Environment): the language and the data model , 1989, SIGMOD '89.

[2]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[3]  Eric N. Hanson,et al.  Rule condition testing and action execution in Ariel , 1992, SIGMOD '92.

[4]  Umeshwar Dayal,et al.  The architecture of an active database management system , 1989, SIGMOD '89.

[5]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[6]  Tomasz Imielinski,et al.  Integrity checking for multiple updates , 1985, SIGMOD '85.

[7]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[8]  Michael Hammer,et al.  Efficient monitoring of database assertions , 1978, SIGMOD '78.

[9]  Processing time-constrained aggregate queries in CASE-DB , 1993, TODS.

[10]  Hamid Pirahesh,et al.  Extensions to Starburst: objects, types, functions, and rules , 1991, CACM.

[11]  Matthew Morgenstern Active Databases as a Paradigm for Enhanced Computing Environments , 1983, VLDB.

[12]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[13]  Michael Stonebraker,et al.  On rules, procedure, caching and views in data base systems , 1990, SIGMOD '90.

[14]  Michael Stonebraker,et al.  On rules, procedures, caching and views in database systems , 1994, SIGMOD 1994.

[15]  Maurice M. Tatsuoka,et al.  Multivariate Analysis of Variance , 1988 .

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Daniel P. Miranker,et al.  Index support for rule activation , 1993, SIGMOD '93.

[18]  Wen-Chi Hou,et al.  Statistical inference of unknown attribute values in databases , 1993, CIKM '93.

[19]  Timos K. Sellis,et al.  Implementing large production systems in a DBMS environment: concepts and algorithms , 1988, SIGMOD '88.

[20]  Catriel Beeri,et al.  A Model for Active Object Oriented Databases , 1991, VLDB.

[21]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[22]  Peter O. Steiner,et al.  Statistics: An Introduction to Quantitative Economic Research. , 1964 .

[23]  Donald D. Chamberlin,et al.  Functional specifications of a subsystem for data base integrity , 1975, VLDB '75.