Oblivious Sampling with Applications to Two-Party k-Means Clustering

The k -means clustering problem is one of the most explored problems in data mining. With the advent of protocols that have proven to be successful in performing single database clustering, the focus has shifted in recent years to the question of how to extend the single database protocols to a multiple database setting. To date, there have been numerous attempts to create specific multiparty k -means clustering protocols that protect the privacy of each database, but according to the standard cryptographic definitions of “privacy-protection”, so far all such attempts have fallen short of providing adequate privacy. In this paper, we describe a Two-Party k -Means Clustering Protocol that guarantees privacy against an honest-but-curious adversary, and is more efficient than utilizing a general multiparty “compiler” to achieve the same task. In particular, a main contribution of our result is a way to compute efficiently multiple iterations of k -means clustering without revealing the intermediate values. To achieve this, we describe a technique for performing two-party division securely and also introduce a novel technique allowing two parties to securely sample uniformly at random from an unknown domain size. The resulting Division Protocol and Random Value Protocol are of use to any protocol that requires the secure computation of a quotient or random sampling. Our techniques can be realized based on the existence of any semantically secure homomorphic encryption scheme. For concreteness, we describe our protocol based on Paillier Homomorphic Encryption scheme (see Paillier in Advances in: cryptology EURO-CRYPT’99 proceedings, LNCS 1592, pp 223–238, 1999). We will also demonstrate that our protocol is efficient in terms of communication, remaining competitive with existing protocols (such as Jagannathan and Wright in: KDD’05, pp 593–599, 2005) that fail to protect privacy.

[1]  Feng Bao,et al.  Oblivious Scalar-Product Protocols , 2006, ACISP.

[2]  A. Yao How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[3]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[4]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[5]  S. Rajsbaum Foundations of Cryptography , 2014 .

[6]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[7]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[8]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[9]  Rafail Ostrovsky,et al.  Zero-knowledge from secure multiparty computation , 2007, STOC '07.

[10]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[11]  Rafail Ostrovsky,et al.  Delayed-Input Non-Malleable Zero Knowledge and Multi-Party Coin Tossing in Four Rounds , 2017, IACR Cryptol. ePrint Arch..

[12]  Chunhua Su,et al.  Privacy-Preserving Two-Party K-Means Clustering via Secure Approximation , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[13]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[14]  Jan Camenisch,et al.  Efficient Computation Modulo a Shared Secret with Application to the Generation of Shared Safe-Prime Products , 2002, CRYPTO.

[15]  Tomas Toft,et al.  On Secure Two-Party Integer Division , 2012, Financial Cryptography.

[16]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[17]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[18]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[19]  Stephen R. Tate,et al.  Optimal size integer division circuits , 1989, STOC '89.

[20]  Bart Mennink,et al.  Modulo Reduction for Paillier Encryptions and Application to Secure Statistical Analysis , 2010, Financial Cryptography.

[21]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[22]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[23]  Eike Kiltz,et al.  Secure Computation of the Mean and Related Statistics , 2005, IACR Cryptol. ePrint Arch..

[24]  Ran Canetti,et al.  Security and Composition of Multiparty Cryptographic Protocols , 2000, Journal of Cryptology.

[25]  Moni Naor,et al.  Oblivious Polynomial Evaluation , 2006, SIAM J. Comput..

[26]  Octavian Catrina,et al.  Secure Computation with Fixed-Point Numbers , 2010, Financial Cryptography.

[27]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[28]  Thijs Veugen,et al.  Encrypted integer division and secure comparison , 2014, Int. J. Appl. Cryptogr..

[29]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.