Privacy-preserving Bayesian network structure computation on distributed heterogeneous data

As more and more activities are carried out using computers and computer networks, the amount of potentially sensitive data stored by business, governments, and other parties increases. Different parties may wish to benefit from cooperative use of their data, but privacy regulations and other privacy concerns may prevent the parties from sharing their data. Privacy-preserving data mining provides a solution by creating distributed data mining algorithms in which the underlying data is not revealed.In this paper, we present a privacy-preserving protocol for a particular data mining task: learning the Bayesian network structure for distributed heterogeneous data. In this setting, two parties owning confidential databases wish to learn the structure of Bayesian network on the combination of their databases without revealing anything about their data to each other. We give an efficient and privacy-preserving version of the K2 algorithm to construct the structure of a Bayesian network for the parties' joint data.

[1]  Joan Feigenbaum,et al.  Secure multiparty computation of approximations , 2001, TALG.

[2]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[3]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[4]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine-mediated learning.

[5]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[6]  Wenliang Du,et al.  Secure Multi-party Computational Geometry , 2001, WADS.

[7]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[8]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[9]  KantarciogluMurat,et al.  Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data , 2004 .

[10]  Lynn A. Karoly,et al.  Health Insurance Portability and Accountability Act of 1996 (HIPAA) Administrative Simplification , 2010, Practice Management Consultant.

[11]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[12]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[13]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[15]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[16]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[17]  Paul Ashley,et al.  From privacy promises to privacy management: a new approach for enforcing privacy throughout an enterprise , 2002, NSPW '02.

[18]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[19]  Kenji Yamanishi,et al.  Distributed cooperative Bayesian learning strategies , 1997, COLT '97.

[20]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[21]  Yuval Ishai,et al.  Selective private function evaluation with applications to private statistics , 2001, PODC '01.

[22]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[23]  Rong Chen,et al.  Learning Bayesian Network Structure from Distributed Data , 2003, SDM.