FD/spl I.bar/Mine: discovering functional dependencies in a database using equivalences

The discovery of FDs from databases has recently become a significant research problem. In this paper, we propose a new algorithm, called FD-Mine. FD-Mine takes advantage of the rich theory of FDs to reduce both the size of the dataset and the number of FDs to be checked by using discovered equivalences. We show that the pruning does not lead to loss of information. Experiments on 15 UCI datasets show that FD-Mine can prune more candidates than previous methods.

[1]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[2]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[3]  John F. Roddick,et al.  Handling Discovered Structure in Database Systems , 1996, IEEE Trans. Knowl. Data Eng..

[4]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[5]  Mark Levene,et al.  Evolving Example Relations to Satisfy Functional Dependencies , 1998, IADT.

[6]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[7]  Peter A. Flach,et al.  Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..

[8]  Heikki Mannila,et al.  Theoretical frameworks for data mining , 2000, SKDD.

[9]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[10]  Mehmet M. Dalkilic,et al.  Information dependencies , 2000, PODS '00.

[11]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[12]  Rosine Cicchetti,et al.  Functional and embedded dependency inference: a data mining point of view , 2001, Inf. Syst..

[13]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[14]  C. Wu,et al.  Does Credit Score Really Explain Insurance Losses? Multivariate Analysis from a Data Mining Point of View , 2003 .

[15]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.