On approximation measures for functional dependencies

We examine the issue of how to measure the degree to which a functional dependency (FD) is approximate. The primary motivation lies in the fact that approximate FDs represent potentially interesting patterns existent in a table. Their discovery is a valuable data mining problem. However, before algorithms can be developed, a measure must be defined quantifying their approximation degree.First we develop an approximation measure by axiomatizing the following intuition: the degree to which X → Y is approximate in a table T is the degree to which T determines a function from ΠX(T) to ΠY(T). We prove that a unique unnormalized measure satisfies these axioms up to a multiplicative constant. Next we compare the measure developed with two other measures from the literature. In all but one case, we show that the measures can be made to differ as much as possible within normalization. We examine these measure on several real datasets and observe that many of the theoretically possible extreme differences do not bear themselves out. We offer some conclusions as to particular situations where certain measures are more appropriate than others.

[1]  K. K. Nambiar,et al.  Some Analytic Tools for the Design of Relational Database Systems , 1980, VLDB.

[2]  Bernhard Thalheim,et al.  Asymptotic Properties of Keys and Functional Dependencies in Random Databases , 1998, Theor. Comput. Sci..

[3]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[4]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[5]  L. Goddard Information Theory , 1962, Nature.

[6]  Bernhard Pfahrinmx,et al.  Efficient Search for Strong Partial Determinations , 1996 .

[7]  Jean-Marc Petit,et al.  Functional and approximate dependency mining: database and FCA points of view , 2002, J. Exp. Theor. Artif. Intell..

[8]  Toon Calders,et al.  Searching for dependencies at multiple abstraction levels , 2002, TODS.

[9]  Jean-Marc Petit,et al.  A framework for understanding existing databases , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[10]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[11]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[12]  Paul De Bra,et al.  An Algorithm for Horizontal Decompositions , 1983, Inf. Process. Lett..

[13]  Gyula O. H. Katona,et al.  Partial Dependencies in Relational Databases and their Realization , 1992, Discret. Appl. Math..

[14]  Stefan Kramer,et al.  Compression-Based Evaluation of Partial Determinations , 1995, KDD.

[15]  Heikki Mannila,et al.  Discovering functional and inclusion dependencies in relational databases , 1992, Int. J. Intell. Syst..

[16]  Mehmet M. Dalkilic,et al.  Establishing the foundations of data mining , 2000 .

[17]  Tony T. Lee,et al.  An Infornation-Theoretic Analysis of Relational Databases—Part I: Data Dependencies and Information Metric , 1987, IEEE Transactions on Software Engineering.

[18]  Francesco M. Malvestuto,et al.  Statistical treatment of the information content of a database , 1986, Inf. Syst..

[19]  Howard J. Hamilton,et al.  Evaluation of Interestingness Measures for Ranking Discovered Knowledge , 2001, PAKDD.

[20]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[21]  Mehmet M. Dalkilic,et al.  Information dependencies , 2000, PODS '00.

[22]  L. A. Goodman,et al.  Measures of Association for Cross Classifications, IV: Simplification of Asymptotic Variances , 1972 .

[23]  Dan A. Simovici,et al.  Impurity measures in databases , 2002, Acta Informatica.

[24]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[25]  Rosine Cicchetti,et al.  Functional and embedded dependency inference: a data mining point of view , 2001, Inf. Syst..

[26]  V. Rich Personal communication , 1989, Nature.

[27]  Jean-Frann Cois Boulicaut,et al.  A KDD Framework for Database Audit , .

[28]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[29]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[30]  Mehmet M. Dalkilic,et al.  Improving Query Evaluation with Approximate Functional Dependency Based Decompositions , 2002, BNCOD.

[31]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[32]  Fernando Berzal Galiano,et al.  Relational decomposition through partial functional dependencies , 2002, Data Knowl. Eng..

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  Stéphane Lopes DBA Companion: a Tool for Database Analysis , 2001 .