Privacy-preserving hybrid collaborative filtering on cross distributed data

Data collected for collaborative filtering (CF) purposes might be cross distributed between two online vendors, even competing companies. Such corporations might want to integrate their data to provide more precise and reliable recommendations. However, due to privacy, legal, and financial concerns, they do not desire to disclose their private data to each other. If privacy-preserving measures are introduced, they might decide to generate predictions based on their distributed data collaboratively. In this study, we investigate how to offer hybrid CF-based referrals with decent accuracy on cross distributed data (CDD) between two e-commerce sites while maintaining their privacy. Our proposed schemes should prevent data holders from learning true ratings and rated items held by each other while still allowing them to provide accurate CF services efficiently. We perform real data-based experiments to evaluate our proposals in terms of accuracy. The results show that the proposed methods are able to provide precise predictions. Moreover, we analyze our schemes in terms of privacy and supplementary costs. We demonstrate that our schemes are secure, and online overhead costs due to privacy concerns are insignificant.

[1]  Chris Clifton,et al.  Privately Computing a Distributed k-nn Classifier , 2004, PKDD.

[2]  Sheng Zhong,et al.  Privacy preserving Back-propagation neural network learning over arbitrarily partitioned data , 2011, Neural Computing and Applications.

[3]  Alexandre V. Evfimievski,et al.  Randomization in privacy preserving data mining , 2002, SKDD.

[4]  Chris Clifton,et al.  Privacy-preserving data integration and sharing , 2004, DMKD '04.

[5]  Shangteng Huang,et al.  Data privacy protection in multi-party clustering , 2008, Data Knowl. Eng..

[6]  Horng-Jinh Chang,et al.  An anticipation model of potential customers' purchasing behavior based on clustering analysis and association rules analysis , 2007, Expert systems with applications.

[7]  Huseyin Polat,et al.  Privacy-Preserving SVD-Based Collaborative Filtering on Partitioned Data , 2010, Int. J. Inf. Technol. Decis. Mak..

[8]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[9]  Chun-Yao Huang,et al.  Characterizing Web users' online information behavior , 2007 .

[10]  Shuguo HAN,et al.  Multi-Party Privacy-Preserving Decision Trees for Arbitrarily Partitioned Data , 2007 .

[11]  Yücel Saygin,et al.  Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing , 2007, PAKDD Workshops.

[12]  Wenliang Du,et al.  A hybrid multi-group approach for privacy-preserving data mining , 2009, Knowledge and Information Systems.

[13]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[14]  Artak Amirbekyan,et al.  Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets , 2009, Knowledge and Information Systems.

[15]  John F. Canny,et al.  Collaborative filtering with privacy , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[16]  Jianping Fan,et al.  A distributed approach to enabling privacy-preserving model-based classifier training , 2009, Knowledge and Information Systems.

[17]  Peng Liu,et al.  Trust-based secure information sharing between federal government agencies , 2005, J. Assoc. Inf. Sci. Technol..

[18]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[19]  Huseyin Polat,et al.  Providing Naïve Bayesian Classifier-Based Private Recommendations on Partitioned Data , 2007, PKDD.

[20]  Zhang Liang,et al.  A hybrid approach to collaborative filtering for overcoming data sparsity , 2008, 2008 9th International Conference on Signal Processing.

[21]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[22]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[23]  Yücel Saygin,et al.  Privacy Preserving Clustering on Horizontally Partitioned Data , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[24]  Ling Qiu,et al.  Protecting business intelligence and customer privacy while outsourcing data mining tasks , 2008, Knowledge and Information Systems.

[25]  Tsvi Kuflik,et al.  PRAW - A PRivAcy model for the Web , 2005, J. Assoc. Inf. Sci. Technol..

[26]  Yanchun Zhang,et al.  Privacy-preserving distributed association rule mining via semi-trusted mixer , 2007, Data Knowl. Eng..

[27]  Yitao Duan,et al.  Practical Private Computation and Zero-Knowledge Tools for Privacy-Preserving Distributed Data Mining , 2008, SDM.

[28]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[29]  Kun Liu,et al.  Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework , 2007, PKDD.

[30]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[31]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[32]  Yanchun Zhang,et al.  Privacy-preserving naive Bayes classification on distributed data via semi-trusted mixers , 2009, Inf. Syst..

[33]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[34]  Sourav S. Bhowmick,et al.  PRIVATE-IYE: A Framework for Privacy Preserving Data Integration , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[35]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[36]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[37]  Wenliang Du,et al.  Privacy-preserving top-N recommendation on distributed data , 2008, J. Assoc. Inf. Sci. Technol..

[38]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[39]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[40]  Chris Clifton,et al.  Privacy preserving data mining over vertically partitioned data , 2004 .

[41]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[42]  Chris Clifton,et al.  Privacy-preserving clustering with distributed EM mixture modeling , 2004, Knowledge and Information Systems.

[43]  Ehud Gudes,et al.  Association rules mining in vertically partitioned databases , 2006, Data Knowl. Eng..

[44]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[45]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.

[46]  Wenliang Du,et al.  Privacy-Preserving Collaborative Filtering on Vertically Partitioned Data , 2005, PKDD.

[47]  John F. Canny,et al.  Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.

[48]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[49]  Dirk Van den Poel,et al.  Predicting online-purchasing behaviour , 2005, Eur. J. Oper. Res..

[50]  Chunhua Su,et al.  Privacy-Preserving Two-Party K-Means Clustering via Secure Approximation , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[51]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[52]  Oded Goldreich,et al.  A randomized protocol for signing contracts , 1985, CACM.

[53]  C. Pandu Rangan,et al.  Privacy Preserving BIRCH Algorithm for Clustering over Arbitrarily Partitioned Databases , 2007, ADMA.

[54]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[55]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.