Efficient and Privacy-Preserving k-Means Clustering for Big Data Mining

Recent advances in sensing and storing technologies have led to big data age where a huge amount of data are distributed across sites to be stored and analysed. Indeed, cluster analysis is one of the data mining tasks that aims to discover patterns and knowledge through different algorithmic techniques such as k-means. Nevertheless, running k-means over distributed big data stores has given rise to serious privacy issues. Accordingly, many proposed works attempted to tackle this concern using cryptographic protocols. However, these cryptographic solutions introduced performance degradation issues in analysis tasks which does not meet big data properties. In this work, we propose a novel privacy-preserving k-means algorithm based on a simple yet secure and efficient multiparty additive scheme that is cryptography-free. We designed our solution for horizontally partitioned data. Moreover, we demonstrate that our scheme resists against adversaries passive model.

[1]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[2]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[5]  Safia Nait Bahloul,et al.  Privacy preserving k-means clustering: a survey research , 2012, Int. Arab J. Inf. Technol..

[6]  Jonathan M. Garibaldi,et al.  A COMPARISON OF FUZZY AND NON-FUZZY CLUSTERING TECHNIQUES IN CANCER DIAGNOSIS , 2005 .

[7]  Luis Orozco-Barbosa,et al.  Privacy Preserving k-Means Clustering in Multi-Party Environment , 2007, SECRYPT.

[8]  Yücel Saygin,et al.  Distributed privacy preserving k-means clustering with additive secret sharing , 2008, PAIS '08.

[9]  Ran Canetti,et al.  Security and Composition of Multiparty Cryptographic Protocols , 2000, Journal of Cryptology.

[10]  David F. Gleich,et al.  Algorithms and Models for the Web Graph , 2014, Lecture Notes in Computer Science.

[11]  Joseph K. Liu,et al.  Toward efficient and privacy-preserving computing in big data era , 2014, IEEE Network.

[12]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[13]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[14]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[15]  Osmar R. Zaïane,et al.  Privacy Preserving Clustering by Data Transformation , 2010, J. Inf. Data Manag..

[16]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[17]  Jaideep Vaidya,et al.  A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data , 2008, Privacy-Preserving Data Mining.

[18]  Robert E. Tarjan,et al.  Clustering Social Networks , 2007, WAW.

[19]  Shamkant B. Navathe,et al.  A Mixed Fragmentation Methodology For Initial Distributed Database Design , 1995 .

[20]  K. Srinathan,et al.  Efficient Privacy Preserving K-Means Clustering , 2010, PAISI.

[21]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  P Ping Chen,et al.  Secure multiparty computation for privacy preserving data mining , 2012 .

[23]  Rafail Ostrovsky,et al.  Secure two-party k-means clustering , 2007, CCS '07.

[24]  nbspNeha B. Jinwala,et al.  Privacy Preserving Using Distributed K means Clustering for Arbitrarily Partitioned Data , 2014 .

[25]  Valdis E. Krebs,et al.  Uncloaking Terrorist Networks , 2002, First Monday.

[26]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[27]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[28]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[29]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[30]  Devesh C. Jinwala,et al.  Privacy Preserving Distributed K-Means Clustering in Malicious Model Using Zero Knowledge Proof , 2013, ICDCIT.

[31]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[32]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[33]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[34]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..