A framework for understanding existing databases

The authors propose a framework for a broad class of data mining algorithms for understanding existing databases: functional and approximate dependency inference, minimal key inference, example relation generation and normal form tests. We point out that the common data centric step of these algorithms is the discovery of agree sets. A set-oriented approach for discovering agree sets from database relations based on SQL queries is proposed. Experiments have been performed in order to compare the proposed approach with a data mining approach. We also present a novel way to extract approximate functional dependencies having minimal errors from agree sets.

[1]  János Demetrovics,et al.  Relations and minimal keys , 1988, Acta Cybern..

[2]  Surajit Chaudhuri,et al.  Automating Statistics Management for Query Optimizers , 2001, IEEE Trans. Knowl. Data Eng..

[3]  Heikki Mannila,et al.  Practical algorithms for finding prime attributes and testing normal forms , 1989, PODS '89.

[4]  Proceedings International Database Engineering and Applications Symposium , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[5]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[6]  Georg Gottlob,et al.  Investigations on Armstrong Relations , 1990 .

[7]  Surajit Chaudhuri,et al.  AutoAdmin “what-if” index analysis utility , 1998, SIGMOD '98.

[8]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[9]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[10]  Surajit Chaudhuri,et al.  On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases , 1998, KDD.

[11]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[12]  Gio Wiederhold,et al.  Databases , 1984, Computer.

[13]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[14]  Georg Gottlob,et al.  Investigations on Armstrong relations, dependency inference, and excluded functional dependencies , 1990, Acta Cybern..

[15]  Surajit Chaudhuri Data Mining and Database Systems: Where is the Intersection? , 1998, IEEE Data Eng. Bull..

[16]  János Demetrovics,et al.  Functional Dependencies in Relational Databases: A Lattice Point of View , 1992, Discret. Appl. Math..

[17]  Surajit Chaudhuri,et al.  Automating statistics management for query optimizers , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Heikki Mannila,et al.  Design by Example: An Application of Armstrong Relations , 1986, J. Comput. Syst. Sci..

[19]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[20]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[21]  Richard Statman,et al.  On the Structure of Armstrong Relations for Functional Dependencies , 1984, JACM.

[22]  Heikki Mannila,et al.  Methods and Problems in Data Mining , 1997, ICDT.

[23]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[24]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.