Hiding of User Presence for Privacy Preserving Data Mining

Recently, it has been expected to realize privacy preserving data mining in order to acquire valuable knowledge from the combined information sources of several service providers. Therefore researches have been conducted on a distributed anonymization method, which combines the personal information and anonymize it to prevent identifying specific user records. However, in those researches, when sets of the users in the providers are not the same, there is a problem that users' presence in either provider may be revealed. Thus, this paper proposes a new indicator which represents the probability of the presence of users being revealed and introduces a modified distributed anonymization method to satisfy the proposed indicator. Also, we use U.S. census data for evaluation and calculate the relative error of its anonymized data. The results show that it is almost 10-25% in specific cases.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[3]  Benjamin C. M. Fung,et al.  Integrating Private Databases for Data Analysis , 2005, ISI.

[4]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[5]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Benjamin C. M. Fung,et al.  Privacy-preserving data mashup , 2009, EDBT '09.

[7]  Yehuda Lindell,et al.  Secure Multiparty Computation for Privacy-Preserving Data Mining , 2009, IACR Cryptol. ePrint Arch..

[8]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[9]  Takahiro Kawamura,et al.  Distributed Data Federation without Disclosure of User Existence , 2012, DBSec.

[10]  Li Xiong,et al.  Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers , 2009, DBSec.

[11]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Wei Jiang,et al.  Privacy-preserving distributed κ-anonymity , 2005 .

[14]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.