ETL based Cleaning on Database

paper analyses the problem of data cleaning and automatically identifying the "incorrect and inconsistent data" in the dataset. Extraction, Transformation and Loading (ETL) are the different steps for cleaning a data warehouse. Authors have implemented different algorithms like: cleanString, cleanNumber, hit ratio, check data dictionary, check metadata etc in addition to various existing data cleaning algorithm like PNRS. This paper tries is to improve the quality of data in the database system. This paper emphasizes on the citizen database system to make it errorless. Some of the results along with certain statistics are also provided here.

[1]  Hasimah Hj Mohamed,et al.  E-Clean: A Data Cleaning Framework for Patient Data , 2011, 2011 First International Conference on Informatics and Computational Intelligence.

[2]  Arnab Dey,et al.  Data Cleaning in Text File , 2013 .

[3]  Mortadha M. Hamad,et al.  An Enhanced Technique to Clean Data in the Data Warehouse , 2011, 2011 Developments in E-systems Engineering.

[4]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[5]  Srivatsa Maddodi,et al.  Data Deduplication Techniques and Analysis , 2010, 2010 3rd International Conference on Emerging Trends in Engineering and Technology.