Privacy Preserving Clustering by Data Transformation

Despite its benefit in a wide range of applications, data mining techniques also have raised a number of ethical issues. Some such issues include those of privacy, data security, intellectual property rights, and many others. In this paper, we address the privacy problem against unauthorized secondary use of information. To do so, we introduce a family of geometric data transformation methods (GDTMs) which ensure that the mining process will not violate privacy up to a certain degree of security. We focus primarily on privacy preserving data clustering, notably on partition-based and hierarchical methods. Our proposed methods distort only confidential numerical attributes to meet privacy requirements, while preserving general features for clustering analysis. Our experiments demonstrate that our methods are effective and provide acceptable values in practice for balancing privacy and accuracy. We report the main results of our performance evaluation and discuss some open research issues.

[1]  Pat Jefferies Multimedia, Cyberspace & Ethics , 2000 .

[2]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[3]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Willi Klösgen Anonymization Techniques for Knowledge Discovery in Databases , 1995, KDD.

[5]  George H. John Behind-the-scenes data mining: a report on the KDD-98 panel , 1999, SKDD.

[6]  Silvana Castano,et al.  Database Security , 1997, IFIP Advances in Information and Communication Technology.

[7]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[8]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[11]  Dorothy E. Denning,et al.  Inference Controls for Statistical Databases , 1983, Computer.

[12]  Yücel Saygin,et al.  Privacy preserving association rule mining , 2002, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC 2002.

[13]  Mary J. Culnan,et al.  "How Did They Get My Name?": An Exploratory Investigation of Consumer Attitudes Toward Secondary Information Use , 1993, MIS Q..

[14]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[15]  Michael K. Reiter,et al.  Crowds: anonymity for Web transactions , 1998, TSEC.

[16]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[17]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[18]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[19]  Ljiljana Brankovic,et al.  PRIVACY ISSUES IN KNOWLEDGE DISCOVERY AND DATA MINING , 2000 .

[20]  Ljiljana Brankovic,et al.  Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules , 1999, DaWaK.

[21]  Pat Jefferies Multimedia, cyberspace and ethics , 2000, 2000 IEEE Conference on Information Visualization. An International Conference on Computer Visualization and Graphics.