Privacy Preserving Clustering

The freedom and transparency of information flow on the Internet has heightened concerns of privacy. Given a set of data items, clustering algorithms group similar items together. Clustering has many applications, such as customerbehavior analysis, targeted marketing, forensics, and bioinformatics. In this paper, we present the design and analysis of a privacy-preserving k-means clustering algorithm, where only the cluster means at the various steps of the algorithm are revealed to the participating parties. The crucial step in our privacy-preserving k-means is privacy-preserving computation of cluster means.We present two protocols (one based on oblivious polynomial evaluation and the second based on homomorphic encryption) for privacy-preserving computation of cluster means. We have a JAVA implementation of our algorithm. Using our implementation, we have performed a thorough evaluation of our privacy-preserving clustering algorithm on three data sets. Our evaluation demonstrates that privacy-preserving clustering is feasible, i.e., our homomorphic-encryption based algorithm finished clustering a large data set in approximately 66 seconds.

[1]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[2]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[3]  Mineichi Kudo,et al.  Multidimensional curve classification using passing-through regions , 1999, Pattern Recognit. Lett..

[4]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[5]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[6]  David G. Stork,et al.  Pattern Classification , 1973 .

[7]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[8]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[9]  Oded Goldreich,et al.  Foundations of Cryptography: Basic Tools , 2000 .

[10]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[11]  Joan Feigenbaum,et al.  Secure Multiparty Computation of Approximations , 2001, ICALP.

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[15]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[16]  Dorothy E. Denning,et al.  A Security Model for the Statistical Database Problem , 1983, SSDBM.

[17]  Silvio Micali,et al.  Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems , 1991, JACM.

[18]  Fabien Pouget,et al.  Honeypot-based forensics , 2004 .

[19]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[20]  Lorrie Faith Cranor,et al.  The platform for privacy preferences , 1999, CACM.

[21]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System (Awarded Best Student Paper!) , 2004 .

[22]  Oded Goldreich,et al.  Quantifying knowledge complexity , 1999, computational complexity.

[23]  G. Barnett,et al.  Maintaining the Confidentiality of Medical Records Shared over the Internet and the World Wide Web , 1997, Annals of Internal Medicine.

[24]  David A. Wagner,et al.  Privacy-enhancing technologies for the Internet , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[25]  Ran Canetti,et al.  Security and Composition of Multiparty Cryptographic Protocols , 2000, Journal of Cryptology.

[26]  Oded Goldreich Foundations of Cryptography: Index , 2001 .

[27]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 1, Basic Tools , 2001 .

[28]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[29]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[30]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[31]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[32]  Jacques Stern,et al.  A new public key cryptosystem based on higher residues , 1998, CCS '98.

[33]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[34]  Inderjit S. Dhillon,et al.  Diametrical clustering for identifying anti-correlated gene clusters , 2003, Bioinform..

[35]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[36]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[37]  Oded Goldreich Foundations of Cryptography: Volume 1 , 2006 .

[38]  Oded Goldreich,et al.  Foundations of Cryptography: List of Figures , 2001 .

[39]  Lorrie Faith Cranor,et al.  Internet privacy , 1999, CACM.

[40]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[41]  Josh Benaloh,et al.  Dense Probabilistic Encryption , 1999 .

[42]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[43]  Matthew K. Franklin,et al.  Efficient generation of shared RSA keys , 2001, JACM.

[44]  Matthias Klusch,et al.  Distributed Clustering Based on Sampling Local Density Estimates , 2003, IJCAI.

[45]  Niv Gilboa,et al.  Two Party RSA Key Generation , 1999, CRYPTO.

[46]  David B. Skillicorn,et al.  Parallel and Sequential Algorithms for Data Mining Using Inductive Logic , 2001, Knowledge and Information Systems.

[47]  Joseph Turow,et al.  Americans Online Privacy: The System Is Broken , 2003 .

[48]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[49]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[50]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[51]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[52]  Daniel A. Keim,et al.  Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , 2002, KDD.

[53]  David J. Marchette A Statistical Method for Profiling Network Traffic , 1999, Workshop on Intrusion Detection and Network Monitoring.

[54]  Klaus Julisch,et al.  Clustering intrusion detection alarms to support root cause analysis , 2003, TSEC.

[55]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.