Discovering context-aware conditional functional dependencies

Conditional functional dependencies(CFDs) are important techniques for data consistency. However, CFDs are limited to 1) provide the reasonable values for consistency repairing and 2) detect potential errors. This paper presents context-aware conditional functional dependencies(CCFDs) which contribute to provide reasonable values and detect potential errors. Especially, we focus on automatically discovering minimal CCFDs. In this paper, we present context relativity to measure the relationship of CFDs. The overlap of the related CFDs can provide reasonable values which result in more accuracy consistency repairing, and some related CFDs are combined into CCFDs.Moreover,we prove that discovering minimal CCFDs is NP-complete and we design the precise method and the heuristic method. We also present the dominating value to facilitate the process in both the precise method and the heuristic method. Additionally, the context relativity of the CFDs affects the cleaning results. We will give an approximate threshold of context relativity according to data distribution for suggestion. The repairing results are approvedmore accuracy, even evidenced by our empirical evaluation.

[1]  D. Bitton,et al.  A feasibility and performance study of dependency inference (database design) , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[2]  Wenguang Chen,et al.  Incorporating cardinality constraints and synonym rules into conditional functional dependencies , 2009, Inf. Process. Lett..

[3]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[4]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[5]  Wenfei Fan,et al.  Inferring data currency and consistency for conflict resolution , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[6]  Wenguang Chen,et al.  Extending Conditional Dependencies with Built-in Predicates , 2015, IEEE Transactions on Knowledge and Data Engineering.

[7]  Ahmed Eldawy,et al.  NADEEF: a commodity data cleaning system , 2013, SIGMOD '13.

[8]  Hannu Toivonen,et al.  Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[9]  Shuai Ma,et al.  Extending inclusion dependencies with conditions , 2014, Theor. Comput. Sci..

[10]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[11]  Jane Grimson,et al.  Introduction to the Special Issue on Information Quality: The Challenges and Opportunities in Healthcare Systems and Services , 2012, JDIQ.

[12]  Shuai Ma,et al.  Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Laks V. S. Lakshmanan,et al.  Discovering Conditional Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[15]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[16]  Nan Tang,et al.  Towards dependable data repairing with fixing rules , 2014, SIGMOD Conference.

[17]  Paolo Papotti,et al.  Discovering Denial Constraints , 2013, Proc. VLDB Endow..

[18]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[19]  Michael J. Maher Constrained Dependencies , 1995, Theor. Comput. Sci..

[20]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[21]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[22]  Renée J. Miller,et al.  A unified model for data and constraint repair , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[23]  Nan Tang,et al.  Proof positive and negative in data cleaning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[24]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[25]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[26]  Sunil Prabhakar,et al.  ERACER: a database approach for statistical inference and data cleaning , 2010, SIGMOD Conference.

[27]  Wenfei Fan,et al.  Determining the relative accuracy of attributes , 2013, SIGMOD '13.

[28]  Laura M. Haas,et al.  Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.

[29]  Shuai Ma,et al.  Interaction between Record Matching and Data Repairing , 2014, JDIQ.

[30]  Wenfei Fan,et al.  Foundations of Data Quality Management , 2012, Foundations of Data Quality Management.