Improve Repairing Data Process by Using Multiple Based-Rules

This paper proposes a method on solving the problem of dealing with dirty data in the database. Considering the complexity of the structure of the data, based on the previous methods that work on this problem, our method combines the methods that use regular expression and methods that use conditional functional dependencies, to complete the data quality improvement. This method uses dependencies to improve the repairing speed and the searching time on the data. The repairing based on the regular expression is regular while there exist questions that the repairing efficient is influenced by the amount of data. When dealing with the database from company Standard Solution Group (SSG) which is from the reality world data, we have tried other related methods and inspired by these methods, we propose this method. The experiments on the data from SSG shows that this method is much efficient.