Protecting Data through Perturbation Techniques: The Impact on Knowledge Discovery in Databases

Data perturbation is a data security technique that adds ‘noise’ to databases allowing individual record confidentiality. This technique allows users to ascertain key summary information about the data that is not distorted and does not lead to a security breach. Four bias types have been proposed which assess the effectiveness of such techniques. However, these biases only deal with simple aggregate concepts (averages, etc.) found in the database. To compete in today’s business environment, it is critical that organizations utilize data mining approaches to discover additional knowledge about themselves ‘hidden’ in their databases. Thus, database administrators are faced with competing objectives: protection of confidential data versus data disclosure for data mining applications. This paper empirically explores whether data protection provided by perturbation techniques adds a so-called data mining bias to the database. The results find initial support for the existence of this bias.