Privacy-preserving distributed clustering

Clustering is a very important tool in data mining and is widely used in on-line services for medical, financial and social environments. The main goal in clustering is to create sets of similar objects in a data set. The data set to be used for clustering can be owned by a single entity, or in some cases, information from different databases is pooled to enrich the data so that the merged database can improve the clustering effort. However, in either case, the content of the database may be privacy sensitive and/or commercially valuable such that the owners may not want to share their data with any other entity, including the service provider. Such privacy concerns lead to trust issues between entities, which clearly damages the functioning of the service and even blocks cooperation between entities with similar data sets. To enable joint efforts with private data, we propose a protocol for distributed clustering that limits information leakage to the untrusted service provider that performs the clustering. To achieve this goal, we rely on cryptographic techniques, in particular homomorphic encryption, and further improve the state of the art of processing encrypted data in terms of efficiency by taking the distributed structure of the system into account and improving the efficiency in terms of computation and communication by data packing. While our construction can be easily adjusted to a centralized or a distributed computing model, we rely on a set of particular users that help the service provider with computations. Experimental results clearly indicate that the work we present is an efficient way of deploying a privacy-preserving clustering algorithm in a distributed manner.

[1]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[2]  Ahmad-Reza Sadeghi,et al.  Improved Garbled Circuit Building Blocks and Applications to Auctions and Computing Minima , 2009, IACR Cryptol. ePrint Arch..

[3]  Andrew Chi-Chih Yao,et al.  How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[4]  Rebecca N. Wright,et al.  A New Privacy-Preserving Distributed k-Clustering Algorithm , 2006, SDM.

[5]  Oded Goldreich Foundations of Cryptography: Index , 2001 .

[6]  Stefan Katzenbeisser,et al.  A secure multidimensional point inclusion protocol , 2007, MM&Sec.

[7]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[8]  Zekeriya Erkin,et al.  Privacy-Preserving User Data Oriented Services for Groups with Dynamic Participation , 2013, ESORICS.

[9]  Mauro Barni,et al.  Composite Signal Representation for Fast and Storage-Efficient Processing of Encrypted Signals , 2010, IEEE Transactions on Information Forensics and Security.

[10]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[11]  Osmar R. Zaïane,et al.  Achieving Privacy Preservation when Sharing Data for Clustering , 2004, Secure Data Management.

[12]  Ivan Damgård,et al.  Multiparty Computation Goes Live , 2008, IACR Cryptol. ePrint Arch..

[13]  Rafail Ostrovsky,et al.  Secure two-party k-means clustering , 2007, CCS '07.

[14]  Mauro Barni,et al.  Encrypted signal processing for privacy protection: Conveying the utility of homomorphic encryption and multiparty computation , 2013, IEEE Signal Processing Magazine.

[15]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[16]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[17]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[18]  Zekeriya Erkin,et al.  Generating Private Recommendations Efficiently Using Homomorphic Encryption and Data Packing , 2012, IEEE Transactions on Information Forensics and Security.

[19]  Naganand Doraswamy,et al.  Ipsec: the new security standard for the internet , 1999 .

[20]  Zekeriya Erkin,et al.  Privacy-preserving user clustering in a social network , 2009, 2009 First IEEE International Workshop on Information Forensics and Security (WIFS).

[21]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[22]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[23]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[24]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[25]  Zekeriya Erkin,et al.  Efficient privacy preserving K-means clustering in a three-party setting , 2011, 2011 IEEE International Workshop on Information Forensics and Security.