Towards publishing recommendation data with predictive anonymization

Recommender systems are used to predict user preferences for products or services. In order to seek better prediction techniques, data owners of recommender systems such as Netflix sometimes make their customers' reviews available to the public, which raises serious privacy concerns. With only a small amount of knowledge about individuals and their ratings to some items in a recommender system, an adversary may easily identify the users and breach their privacy. Unfortunately, most of the existing privacy models (e.g., k-anonymity) cannot be directly applied to recommender systems. In this paper, we study the problem of privacy-preserving publishing of recommendation datasets. We represent recommendation data as a bipartite graph, and identify several attacks that can re-identify users and determine their item ratings. To deal with these attacks, we first give formal privacy definitions for recommendation data, and then develop a robust and efficient anonymization algorithm, Predictive Anonymization, to achieve our privacy goals. Our experimental results show that Predictive Anonymization can prevent the attacks with very little impact to prediction accuracy.

[1]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[2]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[3]  Bradley N. Miller,et al.  Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system , 1998, CSCW '98.

[4]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[7]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[8]  Justin Zhan,et al.  Preserving Privacy in Joining Recommender Systems , 2008, 2008 International Conference on Information Security and Assurance (isa 2008).

[9]  Ayhan Demiriz,et al.  Constrained K-Means Clustering , 2000 .

[10]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[11]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[12]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[13]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[15]  John Riedl,et al.  Combining Collaborative Filtering with Personal Agents for Better Recommendations , 1999, AAAI/IAAI.

[16]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Kamal Ali,et al.  TiVo: making show recommendations using a distributed collaborative filtering architecture , 2004, KDD.

[18]  Vitaly Shmatikov,et al.  How To Break Anonymity of the Netflix Prize Dataset , 2006, ArXiv.

[19]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[20]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[21]  Donald F. Towsley,et al.  Resisting structural re-identification in anonymized social networks , 2008, The VLDB Journal.

[22]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[23]  Wenliang Du,et al.  Privacy-preserving collaborative filtering using randomized perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[24]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[25]  Jian Pei,et al.  Preserving Privacy in Social Networks Against Neighborhood Attacks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[27]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[28]  Danfeng Yao,et al.  The union-split algorithm and cluster-based anonymization of social networks , 2009, ASIACCS '09.

[29]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[30]  John F. Canny,et al.  Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.

[31]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[32]  Ting Yu,et al.  Anonymizing bipartite graph data using safe groupings , 2008, Proc. VLDB Endow..

[33]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[34]  John F. Canny,et al.  Collaborative filtering with privacy , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.