Towards a Practical Clustering Analysis over Encrypted Data

Clustering analysis is one of the most significant unsupervised machine learning tasks, and it is utilized in various fields associated with privacy issues including bioinformatics, finance and image processing. In this paper, we propose a practical solution for privacy-preserving clustering analysis based on homomorphic encryption (HE). Our work is the first HE solution for the mean-shift clustering algorithm. To reduce the super-linear complexity of the original mean-shift algorithm, we adopt a novel random sampling method called dust sampling which perfectly fits in HE and achieves the linear complexity. We also substitute non-polynomial kernels by a new polynomial kernel so that it can be efficiently computed in HE. The HE implementation of our modified mean-shift clustering algorithm based on the approximate HE scheme HEAAN shows prominent performance in terms of speed and accuracy. It takes about 30 minutes with 99% accuracy over several public datasets with hundreds of data, and even for the dataset with 262, 144 data it takes only 82 minutes applying SIMD operations in HEAAN. Our results outperform the previously best known result (SAC 2018) over 400 times.

[1]  Frederik Armknecht,et al.  Unsupervised Machine Learning on Encrypted Data , 2018, IACR Cryptol. ePrint Arch..

[2]  Safia Nait Bahloul,et al.  Privacy preserving k-means clustering: a survey research , 2012, Int. Arab J. Inf. Technol..

[3]  Jung Hee Cheon,et al.  Bootstrapping for Approximate Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[4]  Jung Hee Cheon,et al.  Ensemble Method for Privacy-Preserving Logistic Regression Based on Homomorphic Encryption , 2018, IEEE Access.

[5]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[6]  D. Freedman,et al.  Fast Mean Shift by compact density representation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[8]  Dongxi Liu Practical Fully Homomorphic Encryption without Noise Reduction , 2015, IACR Cryptol. ePrint Arch..

[9]  Zoe L. Jiang,et al.  Outsourcing Two-Party Privacy Preserving K-Means Clustering Protocol in Wireless Sensor Networks , 2015, 2015 11th International Conference on Mobile Ad-hoc and Sensor Networks (MSN).

[10]  Pascal Paillier,et al.  Fast Homomorphic Evaluation of Deep Discretized Neural Networks , 2018, IACR Cryptol. ePrint Arch..

[11]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[12]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[13]  J Vinothkumar,et al.  A Brief Survey on Privacy Preserving Techniques in Data Mining , 2016 .

[14]  Yücel Saygin,et al.  Distributed privacy preserving k-means clustering with additive secret sharing , 2008, PAIS '08.

[15]  Frederik Vercauteren,et al.  Privacy-preserving logistic regression training , 2018, BMC Medical Genomics.

[16]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[17]  Jung Hee Cheon,et al.  Faster Homomorphic Discrete Fourier Transforms and Improved FHE Bootstrapping , 2018, IACR Cryptol. ePrint Arch..

[18]  Frans Coenen,et al.  K-Means Clustering Using Homomorphic Encryption and an Updatable Distance Matrix: Secure Third Party Data Clustering with Limited Data Owner Interaction , 2017, DaWaK.

[19]  Jung Hee Cheon,et al.  Numerical Methods for Comparison on Homomorphically Encrypted Numbers , 2019, IACR Cryptol. ePrint Arch..

[20]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[21]  Hao Chen,et al.  Improved Bootstrapping for Approximate Homomorphic Encryption , 2019, IACR Cryptol. ePrint Arch..

[22]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[23]  Rafail Ostrovsky,et al.  Secure two-party k-means clustering , 2007, CCS '07.

[24]  Nicolas Gama,et al.  Faster Packed Homomorphic Operations and Efficient Circuit Bootstrapping for TFHE , 2017, ASIACRYPT.

[25]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[26]  Jung Hee Cheon,et al.  Logistic regression model training based on the approximate homomorphic encryption , 2018, BMC Medical Genomics.

[27]  Shigenobu Kobayashi,et al.  Large-Scale k-Means Clustering with User-Centric Privacy Preservation , 2008, PAKDD.

[28]  Yongge Wang Notes on Two Fully Homomorphic Encryption Schemes Without Bootstrapping , 2015, IACR Cryptol. ePrint Arch..

[29]  Robert E Goldschmidt,et al.  Applications of division by convergence , 1964 .

[30]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[31]  M. B. Malik,et al.  Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects , 2012, 2012 Third International Conference on Computer and Communication Technology.

[32]  Zhicong Huang,et al.  Logistic regression over encrypted data from fully homomorphic encryption , 2018, BMC Medical Genomics.

[33]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[34]  Yuchen Zhang,et al.  HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS , 2015, Bioinform..

[35]  Hyungbo Shim,et al.  Toward a Secure Drone System: Flying With Real-Time Homomorphic Authenticated Encryption , 2018, IEEE Access.

[36]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[37]  David J. Wu,et al.  Secure genome-wide association analysis using multiparty computation , 2018, Nature Biotechnology.

[38]  Inderjit S. Dhillon,et al.  Diametrical clustering for identifying anti-correlated gene clusters , 2003, Bioinform..

[39]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[40]  Chunhua Su,et al.  Privacy-Preserving Two-Party K-Means Clustering via Secure Approximation , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).