On the use of decentralization to enable privacy in web-scale recommendation services

We present the design, implementation, and evaluation of a decentralized framework for enabling privacy in Web-scale recommendation services. Our framework, which comprises of a decentralized middleware that is hosted and run by federated entities, is designed to support collaborative-filtering and content-based recommendations. We design a novel distributed protocol that clusters users into interest groups comprised of like-minded members and ensures a desired minimum size (k-anonymity parameter), while keeping user profiles on client-side only. In order to aggregate users' consumption for the purpose of generating recommendations, we design a novel decentralized aggregation mechanism that protects against auxiliary information attacks that have crippled conventional k-anonymity based systems. Our prototype system ensures that the desired k-anonymity level is met, and can prevent auxiliary information attacks using a middleware of modest size, and is empirically shown to be resistant to moderate degree of collusion. While preserving privacy, our system enables effective clustering of like-minded users, and offers good quality of recommendations. Also, the prototype's decentralized design and lightweight protocols enable almost linear-scaling with increased size of the middleware.

[1]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[2]  Wenliang Du,et al.  Achieving Private Recommendations Using Randomized Response Techniques , 2006, PAKDD.

[3]  Tsvi Kuflik,et al.  Enhancing privacy and preserving accuracy of a distributed collaborative filtering , 2007, RecSys '07.

[4]  Harald Steck,et al.  Training and testing of recommender systems on data missing not at random , 2010, KDD.

[5]  Saikat Guha,et al.  Privad: Practical Privacy in Online Advertising , 2011, NSDI.

[6]  Hong Shen,et al.  Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known , 2009, ECML/PKDD.

[7]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[8]  Helen Nissenbaum,et al.  Adnostic: Privacy Preserving Targeted Advertising , 2010, NDSS.

[9]  Fillia Makedon,et al.  Deriving Private Information from Randomly Perturbed Ratings , 2006, SDM.

[10]  Cong Wang,et al.  A Practical System for Privacy-Preserving Collaborative Filtering , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[11]  Christian Bauckhage,et al.  Detecting Trends in Social Bookmarking Systems: A del.icio.us Endeavor , 2010, Int. J. Data Warehous. Min..

[12]  Nina Mishra,et al.  Privacy via the Johnson-Lindenstrauss Transform , 2012, J. Priv. Confidentiality.

[13]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[14]  Anne-Marie Kermarrec,et al.  The Gossple Anonymous Social Network , 2010, Middleware.

[15]  Armen Aghasaryan,et al.  On the Use of LSH for Privacy Preserving Personalization , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[16]  Brian Neil Levine,et al.  A Survey of Solutions to the Sybil Attack , 2006 .

[17]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[18]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  John F. Canny,et al.  Collaborative filtering with privacy , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[20]  Jordi Forné,et al.  A Privacy-Protecting Architecture for Collaborative Filtering via Forgery and Suppression of Ratings , 2011, DPM/SETOP.

[21]  Benjamin Livshits,et al.  RePriv: Re-imagining Content Personalization and In-browser Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[22]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[23]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[24]  Animesh Nandi,et al.  P3: A Privacy Preserving Personalization Middleware for recommendation-based services , 2011 .

[25]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[26]  Anupam Datta,et al.  Provable De-anonymization of Large Datasets with Sparse Dimensions , 2012, POST.

[27]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[28]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[29]  Waseem Ahmad,et al.  An Architecture for Privacy Preserving Collaborative Filtering on Web Portals , 2007, Third International Symposium on Information Assurance and Security.

[30]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[31]  Sanjeev Khanna,et al.  Distributed Private Heavy Hitters , 2012, ICALP.

[32]  Helen Nissenbaum,et al.  Content Based Do Not Track mechanism , 2011 .

[33]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .