Functional and embedded dependency inference: a data mining point of view

Abstract The issue of discovering functional dependencies from populated databases has received a great deal of attention because it is a key concern in database analysis. Such a capability is strongly required in database administration and design while being of great interest in other application fields such as query folding. Investigated for long years, the issue has been recently addressed in a novel and more efficient way by applying principles of data mining algorithms. The two algorithms fitting in such a trend are T ANE and Dep-Miner. They strongly improve previous proposals. In this paper, we propose a new approach adopting a data mining point of view. We define a novel characterization of minimal functional dependencies. This formal framework is sound and simpler than related work. We introduce the new concept of free set for capturing source of functional dependencies. By using the concepts of closure and quasi-closure of attribute sets, targets of such dependencies are characterized. Our approach is enforced through the algorithm F UN which is particularly efficient since it is comparable or improves the two best operational solutions (according to our knowledge): T ANE and Dep-Miner. It makes use of various optimization techniques and it can work on very large databases. Applying on real life or synthetic data more or less correlated, comparative experiments are performed in order to assess performance of F UN against T ANE and Dep-Miner. Moreover, our approach also exhibits (without significant additional execution time) embedded functional dependencies, i.e. dependencies captured in any subset of the attribute set originally considered. Embedded dependencies capture a knowledge specially relevant in all fields where materialized data sets are managed (e.g. materialized views widely used in data warehouses).

[1]  Nicolas Spyratos The partition model: a deductive database model , 1987, TODS.

[2]  Heikki Mannila,et al.  Algorithms for Inferring Functional Dependencies from Relations , 1994, Data Knowl. Eng..

[3]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[4]  Siegfried Bell,et al.  Discovery of Constraints and Data Dependencies in Databases (Extended Abstract) , 1995, ECML.

[5]  Georg Gottlob Computing covers for embedded functional dependencies , 1987, PODS '87.

[6]  Michel A. Melkanoff,et al.  A Method for Helping Discover the Dependencies of a Relation , 1979, Advances in Data Base Theory.

[7]  Heikki Mannila,et al.  On the Complexity of Inferring Functional Dependencies , 1992, Discret. Appl. Math..

[8]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[9]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[10]  Jack Minker,et al.  Advances in Data Base Theory , 1981, Springer US.

[11]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[12]  Lotfi Lakhal,et al.  Matrix-Relation for Statistical Database Management , 1994, EDBT.

[13]  J. H. Jou,et al.  Succinctness in Dependency Systems , 1983, Theor. Comput. Sci..

[14]  Claude Delobel,et al.  Decompositions and functional dependencies in relations , 1980, TODS.

[15]  E. F. Codd,et al.  Further Normalization of the Data Base Relational Model , 1971, Research Report / RJ / IBM / San Jose, California.

[16]  David Maier Minimum covers in the relational database model (Extended Abstract) , 1979, STOC '79.

[17]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  Arie Shoshani,et al.  Representing extended entity-relationship structures in relational databases: a modular approach , 1992, TODS.

[19]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[20]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[21]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[22]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[23]  PhD Mark Levene BSc,et al.  A Guided Tour of Relational Databases and Beyond , 1999, Springer London.

[24]  Noël Novelli Extraction de dépendances fonctionnetitre : Une approche Data Mining , 2000 .

[25]  Zahir Tari,et al.  Object normal forms and dependency constraints for object-oriented schemata , 1997, TODS.

[26]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[27]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[28]  Georg Gottlob,et al.  Investigations on Armstrong Relations , 1990 .

[29]  Peter A. Flach,et al.  Bottom-up induction of functional dependencies from relations , 1993 .

[30]  Surajit Chaudhuri Data Mining and Database Systems: Where is the Intersection? , 1998, IEEE Data Eng. Bull..

[31]  Nicolas Spyratos,et al.  Partition semantics for relations , 1985, PODS '85.

[32]  Heikki Mannila,et al.  Discovering functional and inclusion dependencies in relational databases , 1992, Int. J. Intell. Syst..

[33]  Elisa Bertino,et al.  A View Mechanism for Object-Oriented Databases , 1992, EDBT.

[34]  Alon Y. Halevy,et al.  Recursive Plans for Information Gathering , 1997, IJCAI.

[35]  Jean-François Boulicaut,et al.  Towards the reverse engineering of renormalized relational databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[36]  Peter A. Flach,et al.  Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..

[37]  Philip A. Bernstein,et al.  Computational problems related to the design of normal form relational schemas , 1979, TODS.

[38]  Hannu Toivonen,et al.  Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[39]  Heikki Mannila,et al.  Approximate Dependency Inference from Relations , 1992, ICDT.

[40]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[41]  Mark Levene,et al.  A Lattice View of Functional Dependencies in Incomplete Relations , 1995, Acta Cybern..

[42]  E. F. Codd,et al.  A data base sublanguage founded on the relational calculus , 1971, SIGFIDET '71.

[43]  D. Bitton,et al.  A feasibility and performance study of dependency inference (database design) , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[44]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[45]  R. Agrawal,et al.  Research Report Mining Sequential Patterns: Generalizations and Performance Improvements Limited Distribution Notice Mining Sequential Patterns: Generalizations and Performance Improvements , 1996 .

[46]  Heikki Mannila,et al.  Design by Example: An Application of Armstrong Relations , 1986, J. Comput. Syst. Sci..

[47]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[48]  Mark Levene,et al.  Database design for incomplete relations , 1999, TODS.

[49]  Veda C. Storey,et al.  Reverse Engineering of Relational Databases: Extraction of an EER Model from a Relational Database , 1994, Data Knowl. Eng..

[50]  Johann A. Makowsky,et al.  Identifying Extended Entity-Relationship Object Structures in Relational Schemas , 1990, IEEE Trans. Software Eng..

[51]  Siegfried Bell,et al.  Discovery of data dependencies in relational databases , 1999 .

[52]  Xiaolei Qian,et al.  Query folding , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[53]  Marvin H. Solomon,et al.  The GMAP: a versatile tool for physical data independence , 1996, The VLDB Journal.

[54]  J. D. Uiiman,et al.  Principles of Database Systems , 2004, PODS 2004.

[55]  Georg Gottlob,et al.  Investigations on Armstrong relations, dependency inference, and excluded functional dependencies , 1990, Acta Cybern..

[56]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[57]  Carmem S. Hara,et al.  Reasoning about nested functional dependencies , 1999, PODS '99.

[58]  Carlo Zaniolo,et al.  On the design of relational database schemata , 1981, TODS.

[59]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[60]  Richard Statman,et al.  On the Structure of Armstrong Relations for Functional Dependencies , 1984, JACM.

[61]  Andreas Heuer,et al.  Equivalent Schemes in Semantic, Nested Relational, and Relational Database Models , 1989, MFDBS.

[62]  Jarek Gryz,et al.  Query folding with inclusion dependencies , 1998, Proceedings 14th International Conference on Data Engineering.