Efficient discovery of functional and approximate dependencies using partitions

Discovery of functional dependencies from relations has been identified as an important database analysis technique. We present a new approach for finding functional dependencies from large databases, based on partitioning the set of rows with respect to their attribute values. The use of partitions makes the discovery of approximate functional dependencies easy and efficient, and the erroneous or exceptional rows can be identified easily. Experiments show that the new algorithm is efficient in practice. For benchmark databases the running times are improved by several orders of magnitude over previously published results. The algorithm is also applicable to much larger datasets than the previous methods.

[1]  Bernhard Pfahringer,et al.  Eecient Search for Strong Partial Determinations , 1996 .

[2]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[3]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[4]  Mehmet M. Dalkilic,et al.  CE: the Classifier-Estimator Framework for Data Mining , 1997, DS-7.

[5]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[6]  Heikki Mannila,et al.  Dependency Inference , 1987, VLDB.

[7]  Jeffrey C. Schlimmer Using learned dependencies to automatically construct sufficient and sensible editing views , 1993 .

[8]  Zahir Tari,et al.  The Reengineering of Relational Databases Based on Key and Data Correlations , 1997, DS-7.

[9]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[10]  Peter A. Flach,et al.  Bottom-up induction of functional dependencies from relations , 1993 .

[11]  Heikki Mannila,et al.  The power of sampling in knowledge discovery , 1994, PODS '94.

[12]  Grant E. Weddell,et al.  Reasoning about functional dependencies generalized for semantic data models , 1992, TODS.

[13]  Heikki Mannila,et al.  Approximate Dependency Inference from Relations , 1992, ICDT.

[14]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Jean-François Boulicaut,et al.  Towards the reverse engineering of renormalized relational databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  D. Bitton,et al.  A feasibility and performance study of dependency inference (database design) , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[18]  Heikki Mannila,et al.  Design by Example: An Application of Armstrong Relations , 1986, J. Comput. Syst. Sci..

[19]  Heikki Mannila,et al.  On the Complexity of Inferring Functional Dependencies , 1992, Discret. Appl. Math..

[20]  Stefan Kramer,et al.  Efficient Search for Strong Partial Determinations , 1996, KDD.

[21]  Stefan Kramer,et al.  Compression-Based Evaluation of Partial Determinations , 1995, KDD.

[22]  Jeffrey C. Schlimmer,et al.  Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning , 1993, ICML.

[23]  Siegfried Bell,et al.  Discovery of data dependencies in relational databases , 1999 .