14 – Data cleansing

Publisher Summary Data cleansing is the process of transforming data perceived to be incorrect into a form that is perceived to be correct. This chapter discusses data standards and the process of bringing data into conformance with a standard (standardization). It also takes a look at common error paradigms that create the need for cleansing. Following this, it discusses metadata cleansing, which is, making sure that the enterprise metadata reflects the defined metadata guidelines, as well as making sure that the data resource conforms to the metadata. The chapter also describes some of the more routine forms of data cleansing, which include merging/purging and duplicating elimination and updating missing fields. The chapter presents the study of U.S. Postal address standardization as an example of a well-defined standard for which there is a well-developed application solution for data cleansing. Finally, it discusses cleansing in terms of migration of data from legacy systems to new application systems.