Efficient two-party privacy-preserving collaborative k-means clustering protocol supporting both storage and computation outsourcing

Abstract Nowadays, cloud computing has developed well and been applied in many kinds of areas. However, privacy is still the most challenging problem which obstructs it being applied in some privacy-sensitive fields, such as finance and government. Advanced cryptographic algorithms provide data privacy with encryption, which can also support computation on such encrypted data. However, new challenge arises when such ciphertexts come from different parties. In particular, how to execute collaboratively data mining on encrypted data coming from different parties is a key issue from cloud service point of view. This paper focuses on privacy problem on outsourced k-means clustering scheme for two parties. In particular, each party’s data are encrypted only once and then stored in cloud. The proposed privacy-preserving k-means collaborative clustering protocol is executed mainly at the cloud, with O ( k ( m + n ) ) rounds of interactions among the two parties and the cloud, where m and n represent the total numbers of records for the two parties, respectively. It is shown that the protocol is secure in the semi-honest security model and in the malicious model in which only one party is corrupted during the process of centroids re-computation. Both theoretical and experimental analysis of the proposed scheme are also provided.

[1]  Ximeng Liu,et al.  An Efficient Privacy-Preserving Outsourced Calculation Toolkit With Multiple Keys , 2016, IEEE Transactions on Information Forensics and Security.

[2]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[3]  Shigenobu Kobayashi,et al.  Large-scale k-means clustering with user-centric privacy-preservation , 2010, Knowledge and Information Systems.

[4]  Tong Li,et al.  Outsourced privacy-preserving classification service over encrypted data , 2018, J. Netw. Comput. Appl..

[5]  Wei Jiang,et al.  k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[7]  Yücel Saygin,et al.  Distributed privacy preserving k-means clustering with additive secret sharing , 2008, PAIS '08.

[8]  Devesh C. Jinwala,et al.  Privacy Preserving Distributed K-Means Clustering in Malicious Model Using Zero Knowledge Proof , 2013, ICDCIT.

[9]  Vinod Vaikuntanathan,et al.  On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption , 2012, STOC '12.

[10]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[11]  Jin Li,et al.  Differentially private Naive Bayes learning over multiple data sources , 2018, Inf. Sci..

[12]  Devesh C. Jinwala,et al.  Privacy Preserving Distributed K-Means Clustering in Malicious Model Using Verifiable Secret Sharing Scheme , 2014, Int. J. Distributed Syst. Technol..

[13]  Robert H. Deng,et al.  Privacy-Preserving Outsourced Calculation on Floating Point Numbers , 2016, IEEE Transactions on Information Forensics and Security.

[14]  Chunxiao Jiang,et al.  Information Security in Big Data: Privacy and Data Mining , 2014, IEEE Access.

[15]  Ye Zhang,et al.  Fast and Secure Three-party Computation: The Garbled Circuit Approach , 2015, IACR Cryptol. ePrint Arch..

[16]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[17]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[18]  Zoe L. Jiang,et al.  Outsourcing Two-Party Privacy Preserving K-Means Clustering Protocol in Wireless Sensor Networks , 2015, 2015 11th International Conference on Mobile Ad-hoc and Sensor Networks (MSN).

[19]  Witawas Srisa-an,et al.  Significant Permission Identification for Machine-Learning-Based Android Malware Detection , 2018, IEEE Transactions on Industrial Informatics.

[20]  Rafail Ostrovsky,et al.  Secure two-party k-means clustering , 2007, CCS '07.

[21]  Dongxi Liu,et al.  Privacy of outsourced k-means clustering , 2014, AsiaCCS.

[22]  Robert H. Deng,et al.  Efficient and Privacy-Preserving Outsourced Calculation of Rational Numbers , 2018, IEEE Transactions on Dependable and Secure Computing.

[23]  Wei Jiang,et al.  Secure k-nearest neighbor query over encrypted data in outsourced environments , 2013, 2014 IEEE 30th International Conference on Data Engineering.

[24]  Siu-Ming Yiu,et al.  Multi-key privacy-preserving deep learning in cloud computing , 2017, Future Gener. Comput. Syst..

[25]  Zoe L. Jiang,et al.  Outsourced privacy-preserving C4.5 decision tree algorithm over horizontally and vertically partitioned dataset among multiple parties , 2017, Cluster Computing.

[26]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[27]  Jin Li,et al.  Privacy-preserving outsourced classification in cloud computing , 2017, Cluster Computing.

[28]  K. Srinathan,et al.  Efficient Privacy Preserving K-Means Clustering , 2010, PAISI.

[29]  Dongxi Liu,et al.  Privacy-Preserving and Outsourced Multi-user K-Means Clustering , 2014, 2015 IEEE Conference on Collaboration and Internet Computing (CIC).