Mining functional dependencies from data

In this paper, we propose an efficient rule discovery algorithm, called FD_Mine, for mining functional dependencies from data. By exploiting Armstrong’s Axioms for functional dependencies, we identify equivalences among attributes, which can be used to reduce both the size of the dataset and the number of functional dependencies to be checked. We first describe four effective pruning rules that reduce the size of the search space. In particular, the number of functional dependencies to be checked is reduced by skipping the search for FDs that are logically implied by already discovered FDs. Then, we present the FD_Mine algorithm, which incorporates the four pruning rules into the mining process. We prove the correctness of FD_Mine, that is, we show that the pruning does not lead to the loss of useful information. We report the results of a series of experiments. These experiments show that the proposed algorithm is effective on 15 UCI datasets and synthetic data.

[1]  Jean-Marc Petit,et al.  Functional and approximate dependency mining: database and FCA points of view , 2002, J. Exp. Theor. Artif. Intell..

[2]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[3]  Ronald Fagin,et al.  Functional Dependencies in a Relational Data Base and Propositional Logic , 1977, IBM J. Res. Dev..

[4]  János Demetrovics,et al.  Functional Dependencies in Relational Databases: A Lattice Point of View , 1992, Discret. Appl. Math..

[5]  Edgar G. Goodaire,et al.  Discrete Mathematics With Graph Theory , 1997 .

[6]  Yuan Zhao,et al.  Automated elicitation of functional dependencies from source codes of database transactions , 2004, Inf. Softw. Technol..

[7]  Cory J. Butz,et al.  FD/spl I.bar/Mine: discovering functional dependencies in a database using equivalences , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[9]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[10]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[11]  Sergio Greco,et al.  Repairing Inconsistent XML Data with Functional Dependencies , 2005, Encyclopedia of Database Technologies and Applications.

[12]  Dmitri V. Kalashnikov,et al.  Domain-independent data cleaning via analysis of entity-relationship graph , 2006, TODS.

[13]  Jeffrey D. Ullman,et al.  Principles of Database Systems , 1980 .

[14]  Peter A. Flach,et al.  Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..

[15]  Claudio Carpineto,et al.  Inferring dependencies from relations: a conceptual clustering approach , 1999 .

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Rosine Cicchetti,et al.  Functional and embedded dependency inference: a data mining point of view , 2001, Inf. Syst..

[18]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[19]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[20]  Ronald Fagin,et al.  An Equivalence Between Relational Database Dependencies and a Fragment of Propositional Logic , 1981, JACM.

[21]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[22]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[23]  Jaume Baixeries i Juvillà,et al.  Lattice characterization of armstrong and symmetric dependencies , 2007 .

[24]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[25]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[26]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .