MINING APPROXIMATE FUNCTIONAL DEPENDENCIES AS CONDENSED REPRESENTATIONS OF ASSOCIATION RULES

Approximate Functional Dependencies (AFD) mined from database relations represent potentially interesting patterns and have proven to be useful for various tasks like feature selection for classification, query optimization and query rewriting. Though the discovery of Functional Dependencies (FDs) from a relational database is a well studied problem, the discovery of AFDs still remains under explored, posing a special set of challenges. Such challenges include defining right interestingness measures for AFDs, employing effective pruning strategies and performing an efficient traversal in the search space of the attribute lattice. This thesis presents a novel perspective for AFDs as condensed representations of association rules; for example, an AFD (Model determines Make) is a condensation of various association rules like, (Model:Accord determines Make:Honda), (Model:Camry determines Make:Toyota). In this regard, this thesis describes two metrics, namely Confidence and Specificity analogous to the standard metrics confidence and support used in association rules respectively. This thesis presents an algorithm called AFDMiner for efficiently mining high quality AFDs by employing effective pruning strategies. AFDMiner performs a bottom-up search in the attribute lattice to find all AFDs and FDs that fall within the given Confidence and Specificity thresholds. Experiments on real data sets show the effectiveness of the approach both in terms of performance as well as the quality of AFDs generated.

[1]  Heikki Mannila,et al.  Approximate Dependency Inference from Relations , 1992, ICDT.

[2]  Jeffrey C. Schlimmer,et al.  Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning , 1993, ICML.

[3]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[5]  Pat Langley,et al.  Induction of Condensed Determinations , 1996, KDD.

[6]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[7]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[8]  Hannu Toivonen,et al.  Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[9]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[10]  Toon Calders,et al.  Discovering roll-up dependencies , 1999, KDD '99.

[11]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[12]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[13]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[14]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[15]  Fernando Berzal Galiano,et al.  Relational decomposition through partial functional dependencies , 2002, Data Knowl. Eng..

[16]  Jean-Marc Petit,et al.  Functional and approximate dependency mining: database and FCA points of view , 2002, J. Exp. Theor. Artif. Intell..

[17]  Xiangji Huang,et al.  Objective and subjective algorithms for grouping association rules , 2003, Third IEEE International Conference on Data Mining.

[18]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[19]  Edward L. Robertson,et al.  On approximation measures for functional dependencies , 2004, Inf. Syst..

[20]  Subbarao Kambhampati,et al.  Answering Imprecise Queries over Autonomous Web Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Howard J. Hamilton,et al.  Mining functional dependencies from data , 2007, Data Mining and Knowledge Discovery.

[22]  Subbarao Kambhampati,et al.  Query Processing over Incomplete Autonomous Databases , 2007, VLDB.

[23]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  QUIC: A System for Handling Imprecision & Incompleteness in Autonomous Databases (Demo) , 2007, CIDR.

[25]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.