Privacy-Preserving Genetic Algorithms for Rule Discovery

Decision tree induction algorithms generally adopt a greedy approach to select attributes in order to optimize some criteria at each iteration of the tree induction process. When a decision tree has been constructed, a set of decision rules may be correspondingly derived. Univariate decision tree induction algorithms generally yield the same tree regardless of how many times it is induced from the same training data set. Genetic algorithms have been shown to discover a better set of rules, albeit at the expense of efficiency. In this paper, we propose a protocol for secure genetic algorithms for the following scenario: Two parties, each holding an arbitrarily partitioned data set, seek to perform genetic algorithms to discover a better set of rules without disclosing their own private data. The challenge for privacy-preserving genetic algorithms is to allow the two parties to securely and jointly evaluate the fitness value of each chromosome using each party's private data but without compromising their data privacy. We propose a new protocol to address this challenge that is correct and secure. The proposed protocol is not only privacy-preserving at each iteration of the genetic algorithm, the intermediate results generated at each iteration do not compromise the data privacy of the participating parties.

[1]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[2]  Deborah R. Carvalho,et al.  A Genetic Algorithm With Sequential Niching For Discovering Small-disjunct Rules , 2002, GECCO.

[3]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[4]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[5]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[6]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[7]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[8]  Alex A. Freitas,et al.  An ant colony based system for data mining: applications to medical data , 2001 .

[9]  Chris Clifton,et al.  Privacy-preserving Naïve Bayes classification , 2008, The VLDB Journal.

[10]  Chris Clifton,et al.  Secure set intersection cardinality with application to association rule mining , 2005, J. Comput. Secur..

[11]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[12]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[13]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[14]  Taneli Mielikäinen,et al.  Cryptographically private support vector machines , 2006, KDD '06.

[15]  Mihir Bellare Advances in Cryptology — CRYPTO 2000 , 2000, Lecture Notes in Computer Science.

[16]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[17]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.