An Ontology-Based Approach for Data Cleaning

There is no magic solution for data cleaning. The user has always to specify the cleaning operations to perform. A huge number of operations may have to be specified. Yet, this is the condition to detect and correct the data quality problems successfully. Most of the cleaning operations are generic enough to be applied to different databases. These operations may be limited to databases of the same domain or can be so general that are domain independent. The traditional approach to data cleaning is to specify the operations at the database schema level. Several changes are required to reuse a cleaning operation in another database. This paper presents an approach that supports the interoperability of the operations among different databases. This is achieved through an ontological level that supports the conceptual specification of the cleaning operations. This abstraction level isolates them from the schema of the databases and allows their reuse easily.