Large-scale k-means clustering with user-centric privacy-preservation

A k-means clustering with a new privacy-preserving concept, user-centric privacy preservation, is presented. In this framework, users can conduct data mining using their private information by storing them in their local storage. After the computation, they obtain only the mining result without disclosing private information to others. In most cases, the number of parties that can join conventional privacy-preserving data mining has been assumed to be only two. In our framework, we assume large numbers of parties join the protocol; therefore, not only scalability but also asynchronism and fault-tolerance is important. Considering this, we propose a k-mean algorithm combined with a decentralized cryptographic protocol and a gossip-based protocol. The computational complexity is O(log n) with respect to the number of parties n, and experimental results show that our protocol is scalable even with one million parties.

[1]  Andrew Chi-Chih Yao,et al.  How to Generate and Exchange Secrets (Extended Abstract) , 1986, FOCS.

[2]  A. Yao How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[3]  Torben P. Pedersen A Threshold Cryptosystem without a Trusted Party (Extended Abstract) , 1991, EUROCRYPT.

[4]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[5]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[6]  Jacques Stern,et al.  Advances in Cryptology — EUROCRYPT ’99 , 1999, Lecture Notes in Computer Science.

[7]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[8]  Mihir Bellare Advances in Cryptology — CRYPTO 2000 , 2000, Lecture Notes in Computer Science.

[9]  Ivan Damgård,et al.  A Generalisation, a Simplification and Some Applications of Paillier's Probabilistic Public-Key System , 2001, Public Key Cryptography.

[10]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[12]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[13]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[14]  Helen J. Wang,et al.  Resilient peer-to-peer streaming , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..

[15]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[16]  Kien A. Hua,et al.  ZIGZAG: an efficient peer-to-peer scheme for media streaming , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[17]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[18]  Chris Clifton,et al.  Privacy-preserving clustering with distributed EM mixture modeling , 2004, Knowledge and Information Systems.

[19]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[20]  GehrkeJohannes,et al.  Privacy preserving mining of association rules , 2004 .

[21]  Foundations of Cryptography: Frontmatter , 2004 .

[22]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[23]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[24]  Nikos A. Vlassis,et al.  Newscast EM , 2004, NIPS.

[25]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System (Awarded Best Student Paper!) , 2004 .

[26]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[27]  Sheng Zhong,et al.  Privacy-Preserving Classification of Customer Data without Loss of Accuracy , 2005, SDM.

[28]  Márk Jelasity,et al.  Gossip-based aggregation in large dynamic networks , 2005, TOCS.

[29]  Somesh Jha,et al.  Privacy Preserving Clustering , 2005, ESORICS.

[30]  Taneli Mielikäinen,et al.  Cryptographically private support vector machines , 2006, KDD '06.

[31]  Chris Clifton,et al.  Privacy-preserving Naïve Bayes classification , 2008, The VLDB Journal.

[32]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[33]  Michael Kearns,et al.  Privacy-Preserving Belief Propagation and Sampling , 2007, NIPS.

[34]  Wenliang Du,et al.  A hybrid multi-group approach for privacy-preserving data mining , 2009, Knowledge and Information Systems.

[35]  Shigenobu Kobayashi,et al.  Privacy-preserving reinforcement learning , 2008, ICML '08.