A RRAME WORK FOR MEASURING AND IMPROVING DATA QUALITY

When data in integrated from different heterogeneous sources and loaded into a data warehouse, various data quality problems arise such as missing values, duplicate records and inconsistent values. One of the main reasons for the failure of data warehouse deployments is the lack of data quality. This research proposes a now frame work for measuring and improving data quality. This research proposes a now frame work measuring and improving data quality. The framework uses three data quality metrics for measuring data quality and Bayesian algorithm for improving data quality. In the proposed framework, the data quality metrics for completeness, uniqueness and consistency identify the missing values, duplicate records and inconsistent values respectively and Bayesian algorithm estimates and replaces the missing categorical attribute values. The Bayesian algorithm uses simple Bayes method for estimation of posterior probabilities of missing attribute belonging to certain category and a missing value is replaced with the value having maximum posterior probability. Experimental study shows that the Bayesian algorithm gives high accuracy for missing data.