Improving k-anonymity based privacy preservation for collaborative filtering

Abstract Collaborative Filtering (CF) is applied in recommender systems to predict users’ preference through filtering the information or patterns. Privacy Preserving Collaborative Filtering (PPCF) aims to achieve privacy protection in the recommendation process, which has an increasing significance in recommender systems and thus attracted much interests in recent years. Existing PPCF methods are mainly based on cryptography, obfuscation, perturbation and differential privacy. They have high computational cost, low data quality and difficulties in calibrating the magnitude of noise. This paper proposes a ( p, l, α )-diversity method that improves the existing k -anonymity method in PPCF, where p is attacker’s prior knowledge about users’ ratings and ( l, α ) is the diversity among users in each group to improve the level of privacy preserving. To achieve ( l, α )-diversity, users in each equivalence class shall come from at least l ( l k ) clusters in α clustering results. Therefore, we firstly apply Latent Factor Model (LFM) to reduce matrix sparsity. Then we propose an improved Maximum Distance to Average Vector (MDAV) microaggregation algorithm based on importance partitioning to increase the homogeneity among the records in each group which can retain better data quality in ( p, l, α )-diversity model. Finally, we apply t -closeness in PPCF. Theoretical analysis and experimental results demonstrate our approach assures a higher level of privacy preserving and less information loss than existing methods.

[1]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[4]  Chih-Fong Tsai,et al.  Cluster ensembles in collaborative filtering recommendation , 2012, Appl. Soft Comput..

[5]  Tianqing Zhu,et al.  An effective privacy preserving algorithm for neighborhood-based collaborative filtering , 2014, Future Gener. Comput. Syst..

[6]  Vitaly Shmatikov,et al.  2011 IEEE Symposium on Security and Privacy “You Might Also Like:” Privacy Risks of Collaborative Filtering , 2022 .

[7]  George Kokolakis,et al.  Computational Statistics and Data Analysis Importance Partitioning in Micro-aggregation , 2022 .

[8]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[10]  Wendy Hui Wang,et al.  Towards publishing recommendation data with predictive anonymization , 2010, ASIACCS '10.

[11]  Stratis Ioannidis,et al.  Privacy-preserving matrix factorization , 2013, CCS.

[12]  Chengqi Zhang,et al.  Defragging Subgraph Features for Graph Classification , 2015, CIKM.

[13]  Wenliang Du,et al.  Privacy-preserving collaborative filtering using randomized perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[14]  Songjie Gong,et al.  Privacy-preserving Collaborative Filtering based on Randomized Perturbation Techniques and Secure Multiparty Computation , 2011 .

[15]  Tianqing Zhu,et al.  Differential privacy for neighborhood-based Collaborative Filtering , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[16]  Hong Shen,et al.  A Security-assured Accuracy-maximised Privacy Preserving Collaborative Filtering Recommendation Algorithm , 2015, IDEAS.

[17]  Stratis Ioannidis,et al.  BlurMe: inferring and obfuscating user gender based on ratings , 2012, RecSys.

[18]  Josep Domingo-Ferrer,et al.  A k-anonymous approach to privacy preserving collaborative filtering , 2015, J. Comput. Syst. Sci..

[19]  Xiaoqiang Chen,et al.  Privacy Preserving Data Publishing for Recommender System , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference Workshops.

[20]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[21]  Ninghui Li,et al.  Closeness: A New Privacy Measure for Data Publishing , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[24]  Jia Wu,et al.  Towards large-scale social networks with online diffusion provenance detection , 2017, Comput. Networks.

[25]  Hiroaki Kikuchi,et al.  Privacy-Preserving Collaborative Filtering Protocol Based on Similarity between Items , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.