This paper proposes the concept and applications of the sectional and conditional functional dependency (SCFD), which is an important extension of the conditional functional dependency (CFD) and the functional dependency (FD). SCFDs describe relationship between parts of an attribute with other attributes, and they can be used as rules during data cleaning. Two algorithms named DBCFD and DKMP are designed for SCFD discovery. The DBCFD can find general SCFDs using the attributes in CFDs, while the DKMP can find SCFDs for the other attributes outside CFDs. The combination of DBCFD and DKMP is able to ensure the completeness of SCFDs. Meanwhile, we provide the SQL technique to clean data based on SCFDs. In experiment we evaluate the effectiveness and efficiency of the SCFDs using dataset generated by TPC-H, and the experiment results illustrate the effect of our algorithm on two kinds of real dataset.
[1]
Donald E. Knuth,et al.
Fast Pattern Matching in Strings
,
1977,
SIAM J. Comput..
[2]
Shuai Ma,et al.
Improving Data Quality: Consistency and Accuracy
,
2007,
VLDB.
[3]
Jinyan Li,et al.
Relative risk and odds ratio: a data mining perspective
,
2005,
PODS '05.
[4]
Paolo Papotti,et al.
Holistic data cleaning: Putting violations into context
,
2013,
2013 IEEE 29th International Conference on Data Engineering (ICDE).
[5]
Wenfei Fan,et al.
Conditional functional dependencies for capturing data inconsistencies
,
2008,
TODS.