Unary and n-ary inclusion dependency discovery in relational databases

Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved. This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this data-mining problem.

[1]  Jean-Marc Petit,et al.  Approximating a Set of Approximate Inclusion Dependencies , 2005, Intelligent Information Systems.

[2]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[3]  Jean-Marc Petit,et al.  DBA companion: a tool for logical database tuning , 2004, Proceedings. 20th International Conference on Data Engineering.

[4]  Elke A. Rundensteiner,et al.  Discovery of High-Dimensional. , 2003, ICDE 2003.

[5]  Heikki Mannila,et al.  Design of Relational Databases , 1992 .

[6]  Rosine Cicchetti,et al.  Functional and embedded dependency inference: a data mining point of view , 2001, Inf. Syst..

[7]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[8]  De MarchiFabien,et al.  Analysis of existing databases at the logical level , 2003 .

[9]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[10]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[11]  PhD Mark Levene BSc,et al.  A Guided Tour of Relational Databases and Beyond , 1999, Springer London.

[12]  Antonio L. Furtado,et al.  Enforcing Inclusion Dependencies and Referencial Integrity , 1988, VLDB.

[13]  Jean-Marc Petit,et al.  Analysis of existing databases at the logical level: the DBA companion project , 2003, SGMD.

[14]  Kweku-Muata Osei-Bryson,et al.  A formal method for analyzing and integrating the rule-sets of multiple experts , 1992, Inf. Syst..

[15]  Jean-Marc Petit,et al.  Zigzag: a new algorithm for mining large inclusion dependencies in databases , 2003, Third IEEE International Conference on Data Mining.

[16]  Jarek Gryz,et al.  Query folding with inclusion dependencies , 1998, Proceedings 14th International Conference on Data Engineering.

[17]  Mark Levene,et al.  Justification for Inclusion Dependency Normal Form , 2000, IEEE Trans. Knowl. Data Eng..

[18]  Felix Naumann,et al.  Efficiently Detecting Inclusion Dependencies , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Siegfried Bell,et al.  Discovery of Constraints and Data Dependencies in Databases (Extended Abstract) , 1995, ECML.

[20]  Qi Cheng,et al.  Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database , 1999, VLDB.

[21]  Bernhard Thalheim,et al.  An Informal and Efficient Approach for Obtaining Semantic Constraints Using Sample Data and Natural Language Processing , 1995, Semantics in Databases.

[22]  Aristides Gionis,et al.  Approximating a collection of frequent sets , 2004, KDD.

[23]  Jean-Marc Petit,et al.  Efficient Algorithms for Mining Inclusion Dependencies , 2002, EDBT.

[24]  Jean-Marc Petit,et al.  Discovering interesting inclusion dependencies: application to logical database tuning , 2002, Inf. Syst..

[25]  Elke A. Rundensteiner,et al.  Discovery of high-dimensional inclusion dependencies , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[26]  John C. Mitchell The Implication Problem for Functional and Inclusion Dependencies , 1984, Inf. Control..

[27]  Jean-Marc Petit,et al.  Functional and approximate dependencies mining: databases and FCA point of view , 2002 .

[28]  Takayuki Tomaru,et al.  The CLIO project , 2006 .

[29]  Heikki Mannila,et al.  Discovering functional and inclusion dependencies in relational databases , 1992, Int. J. Intell. Syst..

[30]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[31]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[32]  Theodore Johnson,et al.  Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.

[33]  Heikki Mannila,et al.  Inclusion dependencies in database design , 1986, 1986 IEEE Second International Conference on Data Engineering.

[34]  Ronald Fagin,et al.  Inclusion Dependencies and Their Interaction with Functional Dependencies , 1984, J. Comput. Syst. Sci..

[35]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[36]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[37]  Toon Calders,et al.  On Monotone Data Mining Languages , 2001, DBPL.

[38]  Bernhard Ganter,et al.  Formal Concept Analysis, 6th International Conference, ICFCA 2008, Montreal, Canada, February 25-28, 2008, Proceedings , 2008, International Conference on Formal Concept Analysis.

[39]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.