Deriving Private Information from Randomly Perturbed Ratings

Collaborative filtering techniques have become popular in the past several years as an effective way to help people deal with information overload. An important security concern in traditional recommendation systems is that users disclose information that may compromise their individual privacy when providing ratings. Randomized perturbation schemes have been proposed to disguise user ratings while still producing accurate recommendations. However, recent research has suggested that perturbation schemes might not be able to preserve privacy as much as has been believed. We propose two data reconstruction methods that derive original private information from disguised data in existing perturbation collaborative filtering schemes. One method is based on k-means clustering and the other uses singular value decomposition (SVD). We have conducted theoretical and experimental analysis on the difference between original data and reconstructed data. Our experiments show that both methods can derive a considerable amount of original information. This study helps to determine an empirical trade-off between recommendation accuracy and user privacy in perturbation schemes.

[1]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[2]  Fillia Makedon,et al.  Using singular value decomposition approximation for collaborative filtering , 2005, Seventh IEEE International Conference on E-Commerce Technology (CEC'05).

[3]  John F. Canny,et al.  Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.

[4]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[5]  Wenliang Du,et al.  SVD-based collaborative filtering with privacy , 2005, SAC '05.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[8]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[9]  John F. Canny,et al.  Collaborative filtering with privacy , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[10]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[11]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[12]  Wenliang Du,et al.  Privacy-preserving collaborative filtering using randomized perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[13]  Anna R. Karlin,et al.  Spectral analysis of data , 2001, STOC '01.

[14]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[15]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[16]  Michael K. Reiter,et al.  Crowds: anonymity for Web transactions , 1998, TSEC.

[17]  José M. F. Moura,et al.  Factorization as a rank 1 problem , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[18]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[19]  Mark S. Ackerman,et al.  Beyond Concern: Understanding Net Users' Attitudes About Online Privacy , 1999, ArXiv.

[20]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[21]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.