Privacy-Preserving Support Vector Machines Learning

This paper addresses the problem of data sharing among multiple parties, without disclosing the data between the parties. We focus on sharing of data among parties involved in a data mining task. We study how to share private or confldential data in the following scenario: without disclosing their private data to each other, multiple parties, each having a private data set, want to collaboratively construct support vector machines using a linear, polynomial or sigmoid kernel function. To tackle this problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we deflne a protocol using homomorphic encryption techniques to exchange the data while keeping it private. We analyze the protocol in the context of mistakes and malicious attacks, and show its robustness against such attacks. All the parties are treated symmetrically: they all participate in the encryption and in the computation involved in learning support vector machines.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[3]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[4]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[5]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[6]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[8]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[9]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[10]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[11]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[12]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jacques Stern,et al.  A new public key cryptosystem based on higher residues , 1998, CCS '98.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Tatsuaki Okamoto,et al.  A New Public-Key Cryptosystem as Secure as Factoring , 1998, EUROCRYPT.

[16]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[17]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[18]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[19]  Josep Domingo-Ferrer,et al.  A Provably Secure Additive and Multiplicative Privacy Homomorphism , 2002, ISC.

[20]  Josh Benaloh,et al.  Dense Probabilistic Encryption , 1999 .

[21]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[22]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[23]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[27]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[28]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[29]  Daniel A. Keim,et al.  Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , 2002, KDD.

[30]  Chris Clifton,et al.  Secure set intersection cardinality with application to association rule mining , 2005, J. Comput. Secur..

[31]  Benny Pinkas,et al.  Secure Computation of the k th-Ranked Element , 2004, EUROCRYPT.