Does Protecting Databases Using Perturbation Techniques Impact Knowledge Discovery

Data perturbation is a data security technique that adds noise in the form of random numbers to numerical database attributes with the goal of maintaining individual record confidentiality. Generalized Additive Data Perturbation (GADP) methods are a family of techniques that preserve key summary information about the data while satisfying security requirements of the database administrator. However, effectiveness of these techniques has only been studied using simple aggregate measures (averages, etc.) found in the database. To compete in today’s business environment, it is critical that organizations utilize data mining approaches to discover information about themselves potentially hidden in their databases. Thus, database administrators are faced with competing objectives: protection of confidential data versus disclosure for data mining applications. This chapter empirically explores whether data protection provided by perturbation techniques adds a so-called Data Mining Bias to the database. While the results of the original study found limited support for this idea, IDEA GROUP PUBLISHING This chapter appears in the book, Advanced Topics in Database Research, vol. 4 edited by Keng Siau © 2005, Idea Group Inc. 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com ITB11281