Fuzzy based clustering algorithm for privacy preserving data mining

Sharing of data among multiple organisations is required in many situations. The shared data may contain sensitive information about individuals which if shared may lead to privacy breach. Thus, maintaining the individual privacy is a great challenge. In order to overcome the challenges involved in data mining, when data needs to be shared, privacy preserving data mining (PPDM) has evolved as a solution. The objective of PPDM is to have the interesting knowledge mined from the data at the same time to maintain the individual privacy. This paper addresses the problem of PPDM by transforming the attributes to fuzzy attributes. Thus, the individual privacy is also maintained, as one cannot predict the exact value, at the same time, better accuracy of mining results is achieved. ID3 and Naive Bayes classification algorithms over three different datasets are used in the experiments to show the effectiveness of the approach.

[1]  Jan Schlörer Disclosure from Statistical Databases: Quantitative Aspects of Trackers , 1980, ACM Trans. Database Syst..

[2]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[3]  Chris Clifton,et al.  Privacy Preserving Data Mining (Advances in Information Security) , 2005 .

[4]  W. Winkler,et al.  MASKING MICRODATA FILES , 1995 .

[5]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[6]  Steven P. Reiss Practical Data-Swapping: The First Steps , 1980, 1980 IEEE Symposium on Security and Privacy.

[7]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[8]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[9]  D. Defays,et al.  Masking Microdata Using Micro-Aggregation , 1999 .

[10]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[11]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[12]  Chris Clifton,et al.  Privacy-Preserving Data Mining , 2006, Encyclopedia of Database Systems.

[13]  Benjamin C. M. Fung,et al.  Privacy-preserving data publishing for cluster analysis , 2009, Data Knowl. Eng..

[14]  Wei Zhao,et al.  A new scheme on privacy-preserving data classification , 2005, KDD '05.

[15]  Josep Domingo-Ferrer,et al.  Fuzzy Microaggregation for Microdata Protection , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[16]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .