Evaluating Fuzzy Clustering Algorithms for Microdata Protection

Microaggregation is a well-known technique for data protection. It is usually operationally defined in a two-step process: (i) a large number of small clusters are built from data and (ii) data are replaced by cluster aggregates. In this work we study the use of fuzzy clustering in the first step. In particular, we consider standard fuzzy c-means and entropy based fuzzy c-means. For both methods, our study includes variable-size and non-variable-size variations. The resulting masking methods are compared using standard scoring methods.

[1]  Sadaaki Miyamoto,et al.  Fuzzy c-means as a regularization and maximum entropy approach , 1997 .

[2]  Vicenc Torra,et al.  Information Fusion in Data Mining , 2003 .

[3]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[4]  Sadaaki Miyamoto,et al.  Methods in Hard and Fuzzy Clustering , 2000 .

[5]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  Lawrence O. Hall,et al.  Fast Accurate Fuzzy Clustering through Data Reduction , 2003 .

[8]  Josep Domingo-Ferrer,et al.  Fuzzy Microaggregation for Microdata Protection , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[9]  T. Kunii,et al.  Soft Computing and Human-Centered Machines , 2013, Computer Science Workbench.

[10]  Stefan Bender,et al.  Re-identifying Register Data by Survey Data Using Cluster Analysis: An Empirical Study , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Sadaaki Miyamoto,et al.  Fuzzy c-Means Clustering Using Kernel Functions in Support Vector Machines , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[12]  Jacek M. Leski Generalized weighted conditional fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[13]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases , 2002, Lecture Notes in Computer Science.

[14]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[15]  Sadaaki Miyamoto,et al.  Regularization and Constraints in Fuzzy c-Means and Possibilistic Clustering , 2001 .

[16]  John F. Kolen,et al.  Reducing the time complexity of the fuzzy c-means algorithm , 2002, IEEE Trans. Fuzzy Syst..

[17]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[18]  Josep Domingo-Ferrer,et al.  Towards Fuzzy c-means Based Microaggregation , 2002 .

[19]  J. Chiang,et al.  A new kernel-based fuzzy clustering approach: support vector clustering with cell growing , 2003, IEEE Trans. Fuzzy Syst..

[20]  Josep Domingo-Ferrer,et al.  Record linkage methods for multidatabase data mining , 2003 .

[21]  Eric R. Ziegel,et al.  Business survey methods , 1995 .

[22]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .