Utility of Knowledge Extracted from Unsanitized Data when Applied to Sanitized Data

Knowledge discovery systems extract knowledge from data that can be used for making prediction about incomplete data items. Utility is a measure of the usefulness of the discovered knowledge and satisfaction of the user with that knowledge. We motivate and address the question of usefulness of sanitized data using the notion of utility in data mining systems. For this we measure the success of patterns and rules discovered from the original data to make predictions about the sanitized data using a previously developed framework. Using experimental results on a set of medical data we demonstrate that it is possible to make useful predictions about the sanitized medical data when rules discovered from the original unsanitized medical data are used. We explain our results and compare it with the case where no sanitization is involved.