A genetic algorithm to discover relaxed functional dependencies from data

Approximate functional dependencies are used in many emerging application domains, such as the identification of data inconsistencies or patterns of semantically related data, query rewriting, and so forth. They can approximate the canonical definition of functional dependency (fd) by relaxing on the data comparison (i.e., by considering data similarity rather than equality), on the extent (i.e., by admitting the possibility that the dependency holds on a subset of data), or both. Approximate fds are difficult to be identified at design time like it happens with fds. In this paper, we propose a genetic algorithm to discover approximate fds from data. An empirical evaluation demonstrates the effectiveness of the algorithm.

[1]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[2]  Felix Naumann,et al.  DFD: Efficient Functional Dependency Discovery , 2014, CIKM.

[3]  Giuseppe Polese,et al.  Understanding user intent on the web through interaction mining , 2015, J. Vis. Lang. Comput..

[4]  Jianzhong Li,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Dynamic Constraints for Record Matching , 2022 .

[5]  Mario Andrés Paredes-Valverde,et al.  A systematic review of tools, languages, and methodologies for mashup development , 2015, Softw. Pract. Exp..

[6]  Jiuyong Li,et al.  Mining Differential Dependencies: A Subspace Clustering Approach , 2014, ADC.

[7]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[8]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[9]  Giuseppe Polese,et al.  Relaxed Functional Dependencies—A Survey of Approaches , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  Genny Tortora,et al.  Synchronization of Queries and Views Upon Schema Evolutions , 2016, ACM Trans. Database Syst..

[11]  Ronald S. King,et al.  Discovery of functional and approximate functional dependencies in relational databases , 2003, Adv. Decis. Sci..

[12]  Felix Naumann,et al.  A Hybrid Approach to Functional Dependency Discovery , 2016, SIGMOD Conference.

[13]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[14]  Peter A. Flach,et al.  Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..

[15]  Giuseppe Polese,et al.  On the Discovery of Relaxed Functional Dependencies , 2016, IDEAS.

[16]  Giuseppe Polese,et al.  A Normalization Framework for Multimedia Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[17]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[18]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[19]  Lei Chen,et al.  Differential dependencies: Reasoning and discovery , 2011, TODS.

[20]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Cory J. Butz,et al.  FD/spl I.bar/Mine: discovering functional dependencies in a database using equivalences , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[22]  Felix Naumann,et al.  Data Profiling with Metanome , 2015, Proc. VLDB Endow..

[23]  Felix Naumann,et al.  Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms , 2015, Proc. VLDB Endow..

[24]  Chengfei Liu,et al.  Discover Dependencies from Data—A Review , 2012, IEEE Transactions on Knowledge and Data Engineering.

[25]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[26]  Edward L. Robertson,et al.  On approximation measures for functional dependencies , 2004, Inf. Syst..