Data enrichment/enhancement

This chapter discusses the utilization of data standardization, clustering techniques, and data quality rules to express enrichment directives. The chapter also presents a few data enhancement examples and how they can be affected by poor data quality. The goal of enrichment is to provide a platform deriving more knowledge from collections of data. There are different ways to enhance data. Converting data to a standardized format is an extremely powerful enhancement. Because a standard is a distinct model to which all items in a set must conform, there is usually a well-defined rule set describing both how to determine if an item conforms to the standard and what actions need to be taken in order to bring the offending item into conformance. Enhancement can also be achieved through provenance, context, data merging, and inference. Data merging helps in deriving more knowledge from data sets. Sometimes regular joins are insufficient, and more sophisticated means such as approximate matching are required for linking together records. Another step in intelligent enhancement is adding data quality rules. These rules help in tagging the record with a new field indicating the location in the information chain where the record was validated and a “stamp of approval.” If the record does not conform to the rule set, the record can be enriched with a new field including the rule violated and the location in the information chain where the validation failed. Adding additional business rules to the merging process can improve its usability.