Generating k-Anonymous Microdata by Fuzzy Possibilistic Clustering

Collecting, releasing and sharing microdata about individuals is needed in some domains to support research initiatives aiming to create new valuable knowledge, by means of data mining and analysis tools. Thus, seeking individuals’ anonymity is required to guarantee their privacy prior publication. The k-anonymity by microaggregation, is a widely accepted model for data anonymization. It consists in de-associating the relationship between the identity of data subjects, i.e. individuals, and their confidential information. However, this method shows limits when dealing with real datasets. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data. Thus, decreasing the information loss during the anonymization process is a compelling task to achieve. This paper aims to deal with such challenge. Doing so, we propose a microaggregation algorithm called Micro-PFSOM, based on fuzzy possibilitic clustering. The main thrust of this algorithm stands in applying an hybrid anonymization process.

[1]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[2]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[3]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[4]  Casanovas,et al.  Disclosure risk assessment in statistical data protection , 2004 .

[5]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[7]  Lisa Singh,et al.  Exploring re-identification risks in public domains , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[8]  Indrajit Ray,et al.  On the Optimal Selection of k in the k-Anonymity Problem , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[10]  Simson L. Garfinkel,et al.  De-Identification of Personal Information , 2015 .

[11]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[12]  Sadok Ben Yahia,et al.  Multi-PFKCN : A fuzzy possibilistic clustering algorithm based on neural network , 2013, 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[13]  Tieniu Tan,et al.  Learning activity patterns using fuzzy self-organizing neural network , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[15]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[16]  Amel Bouzeghoub,et al.  A New Algorithm for Fuzzy Clustering Able to Find the Optimal Number of Clusters , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[17]  Josep Domingo-Ferrer,et al.  Privacy in Statistical Databases: k-Anonymity Through Microaggregation , 2006, 2006 IEEE International Conference on Granular Computing.

[18]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[19]  Philip S. Yu,et al.  An Introduction to Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[20]  Christine L. Borgman,et al.  The conundrum of sharing research data , 2012, J. Assoc. Inf. Sci. Technol..

[21]  Stefan Bender,et al.  Re-identifying Register Data by Survey Data Using Cluster Analysis: An Empirical Study , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[23]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[24]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[26]  Inderjit S. Dhillon,et al.  Knowledge Discovery: Clustering , 2009, Encyclopedia of Complexity and Systems Science.

[27]  Sadaaki Miyamoto,et al.  Evaluating Fuzzy Clustering Algorithms for Microdata Protection , 2004, Privacy in Statistical Databases.