论文信息 - On the design and quantification of privacy preserving data mining algorithms

On the design and quantification of privacy preserving data mining algorithms

The increasing ability to track and collect large amounts of data with the use of current hardware technology has lead to an interest in the development of data mining algorithms which preserve user privacy. A recently proposed technique addresses the issue of privacy preservation by perturbing the data and reconstructing distributions at an aggregate level in order to perform the mining. This method is able to retain privacy while accessing the information implicit in the original attributes. The distribution reconstruction process naturally leads to some loss of information which is acceptable in many practical situations. This paper discusses an Expectation Maximization (EM) algorithm for distribution reconstruction which is more effective than the currently available method in terms of the level of information loss. Specifically, we prove that the EM algorithm converges to the maximum likelihood estimate of the original distribution based on the perturbed data. We show that when a large amount of data is available, the EM algorithm provides robust estimates of the original distribution. We propose metrics for quantification and measurement of privacy-preserving data mining algorithms. Thus, this paper provides the foundations for measurement of the effectiveness of privacy preserving data mining algorithms. Our privacy metrics illustrate some interesting results on the relative effectiveness of different perturbing distributions.

Charu C. Aggarwal | Dakshi Agrawal | D. Agrawal | C. Aggarwal

[1] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .

[2] A. Froomkin. The Death of Privacy? , 2000 .

[3] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[4] Lorrie Faith Cranor,et al. Internet privacy , 1999, CACM.

[5] Harry L. Van Trees,et al. Detection, Estimation, and Modulation Theory: Radar-Sonar Signal Processing and Gaussian Signals in Noise , 1992 .

[6] Sumio Horiuchi. On the special issue of internet , 1995 .

[7] Chong K. Liew,et al. A data distortion by probability distribution , 1985, TODS.

[8] Paola Benassi,et al. TRUSTe: an online privacy seal program , 1999, CACM.

[9] Ramakrishnan Srikant,et al. Privacy-preserving data mining , 2000, SIGMOD '00.

[10] Harry L. Van Trees,et al. Detection, Estimation, and Modulation Theory, Part I , 1968 .

[11] Oren Etzioni,et al. Privacy interfaces for information management , 1999, CACM.

[12] Chris Clifton,et al. SECURITY AND PRIVACY IMPLICATIONS OF DATA MINING , 1996 .

[13] Ljiljana Brankovic,et al. Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules , 1999, DaWaK.