Knowledge Based Data Cleaning for Data Warehouse Quality

This paper describes an approach for improvement the quality of data warehouse and operational databases with using knowledge. The benefit of this approach is three-folds. First, the incorporation of knowledge into data cleaning is successful to meet the user’s demands and then the data cleaning can be expanded and modified. The knowledge that can be extracted automatically or manually is stored in repository in order to be used and validated among an appropriate process. Second, the propagation of cleaned data to their original sources in order to validate them by the user so the data cleaning can give valid values but incorrect. In addition, the mutual coherence of data is ensured. Third, the user interaction with data cleaning process is taken account in order to control it. The proposed approach is based in the idea that the quality of data will be assured at the sources and the target of data.

[1]  Huanzhuo Ye,et al.  An open data cleaning framework based on semantic rules for Continuous Auditing , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[2]  Esther Pacitti,et al.  Improving Data Freshness in Replicated Databases , 1998 .

[3]  Tok Wang Ling,et al.  IntelliClean: a knowledge-based intelligent data cleaner , 2000, KDD '00.

[4]  Esther Pacitti,et al.  Update Propagation Strategies to Improve Freshness of Data in Lazy Master Schemes , 1997 .

[5]  Tamraparni Dasu,et al.  Data Quality Mining: New Research Directions , 2009 .

[6]  Peter J. Haug,et al.  Exploiting missing clinical data in Bayesian network modeling for predicting medical problems , 2008, J. Biomed. Informatics.

[7]  Cécile Favre,et al.  Evolution of Data Warehouses' Optimization: A Workload Perspective , 2007, DaWaK.

[8]  Matthias Jarke,et al.  Data warehouse process management , 2001, Inf. Syst..

[9]  Marc Shapiro,et al.  Comparing Optimistic Database Replication Techniques , 2007, BDA.

[10]  Achour Mostéfaoui,et al.  The Lord of the Rings: Efficient Maintenance of Views at Data Warehouses , 2002, DISC.

[11]  Katherine G. Herbert,et al.  Biological data cleaning: a case study , 2007, Int. J. Inf. Qual..

[12]  Pedro Rangel Henriques,et al.  An Ontology-Based Approach for Data Cleaning , 2006, ICIQ.

[13]  Patrick Valduriez,et al.  Data Quality Management in a Database Cluster with Lazy Replication , 2005, J. Digit. Inf. Manag..

[14]  Matthias Jarke,et al.  Systematic Development of Data Mining-Based Data Quality Tools , 2003, VLDB.

[15]  Latif Al-Hakim,et al.  Information Quality Management: Theory and Applications , 2006 .

[16]  A Min Tjoa,et al.  The Relevance of Data Warehousing and Data Mining in the Field of Evidence-based Medicine to Support Healthcare Decision Making , 2007 .

[17]  Laure Berti-Équille Un état de l'art sur la qualité des données , 2004, Ingénierie des Systèmes d Inf..

[18]  Zoubida Kedad,et al.  Génération de requêtes de médiation intégrant le nettoyage de données , 2002, Ingénierie des Systèmes d Inf..

[19]  Dariusz Matyja Applications of data mining algorithms to analysis of medical data. , 2007 .

[20]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..