Discovery Algorithms for Embedded Functional Dependencies

Embedded functional dependencies (eFDs) advance data management applications by data completeness and integrity requirements. We show that the discovery problem of eFDs is NP-complete, W[2]-complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies. Nevertheless, we use novel data structures and search strategies to develop row-efficient, column-efficient, and hybrid algorithms for eFD discovery. Our experiments demonstrate that the algorithms scale well in terms of their design targets, and that ranking the eFDs by the number of redundant data values they cause can provide useful guidance in identifying meaningful eFDs for applications. Finally, we demonstrate the benefits of introducing completeness requirements and ranking by the number of redundant data values for approximate and genuine functional dependencies.

[1]  Paul Brown,et al.  GORDIAN: efficient and scalable discovery of composite keys , 2006, VLDB.

[2]  Rosine Cicchetti,et al.  Functional and embedded dependency inference: a data mining point of view , 2001, Inf. Syst..

[3]  Felix Naumann,et al.  DFD: Efficient Functional Dependency Discovery , 2014, CIKM.

[4]  Tobias Friedrich,et al.  The Parameterized Complexity of Dependency Detection in Relational Databases , 2016, IPEC.

[5]  Peter A. Flach,et al.  Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..

[6]  Felix Naumann,et al.  Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms , 2015, Proc. VLDB Endow..

[7]  Felix Naumann,et al.  Data Profiling: A Tutorial , 2017, SIGMOD Conference.

[8]  Sebastian Link,et al.  Possible and Certain SQL Key , 2015, Proc. VLDB Endow..

[9]  Sebastian Link,et al.  Possible and certain keys for SQL , 2016, The VLDB Journal.

[10]  Felix Naumann,et al.  Discovery of Genuine Functional Dependencies from Relational Data with Missing Values , 2018, Proc. VLDB Endow..

[11]  Cory J. Butz,et al.  FD/spl I.bar/Mine: discovering functional dependencies in a database using equivalences , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Sebastian Link,et al.  Discovery and Ranking of Functional Dependencies , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[13]  Sebastian Link,et al.  Discovering Meaningful Certain Keys from Incomplete and Inconsistent Relations , 2016, IEEE Data Eng. Bull..

[14]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[15]  Sebastian Link,et al.  Relational Database Schema Design for Uncertain Data , 2016, CIKM.

[16]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[17]  C. M. WyssMay Finding Minimal Keys in a Relation Instance , 1999 .

[18]  Felix Naumann,et al.  Data Profiling , 2018, Data Profiling.

[19]  Giuseppe Polese,et al.  Relaxed Functional Dependencies—A Survey of Approaches , 2016, IEEE Transactions on Knowledge and Data Engineering.

[20]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[21]  Sebastian Link,et al.  Embedded Functional Dependencies and Data-completeness Tailored Database Design , 2019, Proc. VLDB Endow..

[22]  Sebastian Link,et al.  Probabilistic Keys , 2017, IEEE Transactions on Knowledge and Data Engineering.

[23]  Sebastian Link,et al.  Discovery and Ranking of Embedded Uniqueness Constraints , 2019, Proc. VLDB Endow..

[24]  Felix Naumann,et al.  Efficient Discovery of Approximate Dependencies , 2018, Proc. VLDB Endow..

[25]  Sven Hartmann,et al.  On Codd Families of Keys over Incomplete Relations , 2011, Comput. J..

[26]  Sven Hartmann,et al.  Algorithms for the discovery of embedded functional dependencies , 2021, The VLDB Journal.

[27]  Felix Naumann,et al.  A Hybrid Approach to Functional Dependency Discovery , 2016, SIGMOD Conference.

[28]  Bernhard Thalheim,et al.  On the Number of Independent Functional Dependencies , 2006, FoIKS.

[29]  Sebastian Link,et al.  SQL Schema Design: Foundations, Normal Forms, and Normalization , 2016, SIGMOD Conference.