Unsupervised Machine Learning on Encrypted Data

In the context of Fully Homomorphic Encryption, which allows computations on encrypted data, Machine Learning has been one of the most popular applications in the recent past. All of these works, however, have focused on supervised learning, where there is a labeled training set that is used to configure the model. In this work, we take the first step into the realm of unsupervised learning, which is an important area in Machine Learning and has many real-world applications, by addressing the clustering problem. To this end, we show how to implement the \(K\)-Means-Algorithm. This algorithm poses several challenges in the FHE context, including a division, which we tackle by using a natural encoding that allows division and may be of independent interest. While this theoretically solves the problem, performance in practice is not optimal, so we then propose some changes to the clustering algorithm to make it executable under more conventional encodings. We show that our new algorithm achieves a clustering accuracy comparable to the original \(K\)-Means-Algorithm, but has less than \(5\%\) of its runtime.

[1]  Louis J. M. Aslett,et al.  Encrypted Accelerated Least Squares Regression , 2017, AISTATS.

[2]  Craig Gentry,et al.  (Leveled) fully homomorphic encryption without bootstrapping , 2012, ITCS '12.

[3]  Frederik Vercauteren,et al.  Privacy-preserving logistic regression training , 2018, BMC Medical Genomics.

[4]  Zoe L. Jiang,et al.  Outsourcing Two-Party Privacy Preserving K-Means Clustering Protocol in Wireless Sensor Networks , 2015, 2015 11th International Conference on Mobile Ad-hoc and Sensor Networks (MSN).

[5]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[6]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[7]  Jean-Sébastien Coron,et al.  Scale-Invariant Fully Homomorphic Encryption over the Integers , 2014, Public Key Cryptography.

[8]  Safia Nait Bahloul,et al.  Privacy preserving k-means clustering: a survey research , 2012, Int. Arab J. Inf. Technol..

[9]  Hao Chen,et al.  Simple Encrypted Arithmetic Library - SEAL v2.1 , 2016, Financial Cryptography Workshops.

[10]  Michael Simpson,et al.  Image Classification using non-linear Support Vector Machines on Encrypted Data , 2017, IACR Cryptol. ePrint Arch..

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[12]  Léo Ducas,et al.  FHEW: Bootstrapping Homomorphic Encryption in Less Than a Second , 2015, EUROCRYPT.

[13]  Shiho Moriai,et al.  Privacy-Preserving Deep Learning via Additively Homomorphic Encryption , 2018, IEEE Transactions on Information Forensics and Security.

[14]  Rafail Ostrovsky,et al.  Secure two-party k-means clustering , 2007, CCS '07.

[15]  Frederik Vercauteren,et al.  Fully Homomorphic Encryption with Relatively Small Key and Ciphertext Sizes , 2010, Public Key Cryptography.

[16]  Michael Naehrig,et al.  Privately Evaluating Decision Trees and Random Forests , 2016, IACR Cryptol. ePrint Arch..

[17]  Rebecca N. Wright,et al.  Communication-Efficient Privacy-Preserving Clustering , 2010, Trans. Data Priv..

[18]  Craig Gentry,et al.  Fully Homomorphic Encryption without Bootstrapping , 2011, IACR Cryptol. ePrint Arch..

[19]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[20]  Jiguo Yu,et al.  Mutual Privacy Preserving $k$ -Means Clustering in Social Participatory Sensing , 2017, IEEE Transactions on Industrial Informatics.

[21]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[22]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[23]  Xiaoqian Jiang,et al.  Secure Logistic Regression based on Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[24]  Stefan Katzenbeisser,et al.  Group homomorphic encryption: characterizations, impossibility results, and applications , 2013, Des. Codes Cryptogr..

[25]  Jung Hee Cheon,et al.  Logistic regression model training based on the approximate homomorphic encryption , 2018, BMC Medical Genomics.

[26]  Frederik Armknecht,et al.  A Guide to Fully Homomorphic Encryption , 2015, IACR Cryptol. ePrint Arch..

[27]  Jean-Sébastien Coron,et al.  Public Key Compression and Modulus Switching for Fully Homomorphic Encryption over the Integers , 2012, EUROCRYPT.

[28]  Frederik Armknecht,et al.  (Finite) Field Work: Choosing the Best Encoding of Numbers for FHE Computation , 2017, IACR Cryptol. ePrint Arch..

[29]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[30]  Vinod Vaikuntanathan,et al.  Can homomorphic encryption be practical? , 2011, CCSW '11.

[31]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[32]  Craig Gentry,et al.  Fully Homomorphic Encryption over the Integers , 2010, EUROCRYPT.

[33]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[34]  Jun Sakuma,et al.  Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data , 2016, NDSS.

[35]  Brent Waters,et al.  Homomorphic Encryption from Learning with Errors: Conceptually-Simpler, Asymptotically-Faster, Attribute-Based , 2013, CRYPTO.

[36]  Constance Morel,et al.  Privacy-Preserving Classification on Deep Neural Network , 2017, IACR Cryptol. ePrint Arch..

[37]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[38]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[39]  Frederik Armknecht,et al.  Accelerating Homomorphic Computations on Rational Numbers , 2016, ACNS.

[40]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[41]  Nicolas Gama,et al.  Faster Fully Homomorphic Encryption: Bootstrapping in Less Than 0.1 Seconds , 2016, ASIACRYPT.

[42]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.