On the Use of LSH for Privacy Preserving Personalization

The Locality Sensitive Hashing (LSH) technique of scalably finding nearest-neighbors can be adapted to enable discovering similar users while preserving their privacy. The key idea is to compute the user profile on the end-user device, apply LSH on the local profile, and use the LSH cluster identifier as the interest group identifier of a user. By properties of LSH, the interest group comprises other users with similar interests. The collective behavior of the members of the interest group is anonymously collected at some aggregation node to generate recommendations for the group members. The quality of recommendation depends on the efficiency of the LSH clustering algorithm, i.e. its capability of gathering similar users. In contrast, with conventional usage of LSH (for scalability and not privacy), in our framework one can not perform a linear search over the cluster members to identify the nearest neighbors and to prune away false positives. A good clustering quality is therefore of functional importance for our system. We report in this work how changing the nature of LSH inputs, which in our case corresponds to the user profile representations, impacts the performance of LSH-based clustering and the final quality of recommendations. We present extensive performance evaluations of the LSH-based privacypreserving recommender system using two large datasets of MovieLens ratings and Delicious bookmarks, respectively.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Harald Steck,et al.  Training and testing of recommender systems on data missing not at random , 2010, KDD.

[3]  Anne-Marie Kermarrec,et al.  The Gossple Anonymous Social Network , 2010, Middleware.

[4]  Helen Nissenbaum,et al.  Adnostic: Privacy Preserving Targeted Advertising , 2010, NDSS.

[5]  Dennis McLeod,et al.  Yoda: An Accurate and Scalable Web-Based Recommendation System , 2001, CoopIS.

[6]  Benjamin Livshits,et al.  RePriv: Re-imagining Content Personalization and In-browser Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[7]  Wenliang Du,et al.  SVD-based collaborative filtering with privacy , 2005, SAC '05.

[8]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[9]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[10]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[11]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[12]  Tsvi Kuflik,et al.  Enhancing privacy and preserving accuracy of a distributed collaborative filtering , 2007, RecSys '07.

[13]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[14]  Saikat Guha,et al.  Privad: Practical Privacy in Online Advertising , 2011, NSDI.

[15]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[16]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[17]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[18]  John F. Canny,et al.  Collaborative filtering with privacy , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[19]  Animesh Nandi,et al.  P3: A Privacy Preserving Personalization Middleware for recommendation-based services , 2011 .

[20]  Andreas Pfitzmann,et al.  Anonymity, Unobservability, and Pseudonymity - A Proposal for Terminology , 2000, Workshop on Design Issues in Anonymity and Unobservability.