On the Comparison of Some Fuzzy Clustering Methods for Privacy Preserving Data Mining: Towards the Development of Specific Information Loss Measures

Policy makers and researchers require raw data collected from agencies and companies for their analysis. Nevertheless, any transmission of data to third parties should satisfy some privacy requirements in order to avoid the disclosure of sensitive information. The areas of privacy preserving data mining and statistical disclosure control develop mechanisms for ensuring data privacy. Masking methods are one of such mechanisms. With them, third parties can do computations with a limited risk of disclosure. Disclosure risk and information loss measures have been developed in order to evaluate in which extent data is protected and in which extent data is perturbated. Most of the information loss measures currently existing in the literature are general purpose ones (i. e., not oriented to a particular application). In this work we develop cluster specific information loss measures (for fuzzy clustering). For this purpose we study how to compare the results of fuzzy clustering. I. e., how to compare fuzzy clusters.

[1]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[2]  Zhu Wei-peng On fuzzy c-means for data with tolerance , 2010 .

[3]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[4]  Vicenç Torra,et al.  Record linkage for database integration using fuzzy integrals , 2008 .

[5]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  Pascal Heus,et al.  Data Access in a Cyber World: Making Use of Cyberinfrastructure , 2008, Trans. Data Priv..

[8]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[9]  Aryya Gangopadhyay,et al.  A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms , 2006, The VLDB Journal.

[10]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[11]  Jordi Pont-Tuset,et al.  Ordered Data Set Vectorization for Linear Regression on Data Privacy , 2007, MDAI.

[12]  Josep Domingo-Ferrer,et al.  Record linkage methods for multidatabase data mining , 2003 .

[13]  Sadaaki Miyamoto,et al.  Fuzzy c -Means for Data with Tolerance Defined as Hyper-Rectangle , 2007, MDAI.

[14]  Javier Herranz,et al.  Rethinking rank swapping to decrease disclosure risk , 2008, Data Knowl. Eng..