A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Data

Data mining can extract important knowledge from large data collections, but sometimes these collections are split among various parties. Data warehousing, bringing data from multiple sources under a single authority, increases risk of privacy violations. Furthermore, privacy concerns may prevent the parties from directly sharing even some meta-data. Distributed data mining and processing provide a means to address this issue, particularly if queries are processed in a way that avoids the disclosure of any information beyond the final result. This chapter describes methods to mine horizontally partitioned data without violating privacy and discusses how to use the data mining results in a privacy-preserving way. The methods described here incorporate cryptographic techniques to minimize the information shared, while adding as little as possible overhead to the mining and processing task.

[1]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[2]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  Chris Clifton,et al.  Privately Computing a Distributed k-nn Classifier , 2004, PKDD.

[4]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[5]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[6]  Ran Wolff,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Providing k-Anonymity in Data Mining , 2022 .

[7]  Chi-Jen Lu,et al.  Oblivious polynomial evaluation and oblivious neural learning , 2001, Theor. Comput. Sci..

[8]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[9]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[10]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[11]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[12]  Oded Goldreich,et al.  Foundations of Cryptography: General Cryptographic Protocols , 2004 .

[13]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[14]  Moni Naor,et al.  Oblivious Polynomial Evaluation , 2006, SIAM J. Comput..

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Ivan Damgård,et al.  Multiparty Computation from Threshold Homomorphic Encryption , 2000, EUROCRYPT.

[17]  Chris Clifton,et al.  Secure set intersection cardinality with application to association rule mining , 2005, J. Comput. Secur..

[18]  Mikhail J. Atallah,et al.  A secure protocol for computing dot-products in clustered and distributed environments , 2002, Proceedings International Conference on Parallel Processing.

[19]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[20]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[21]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[22]  Murat Kantarcioglu,et al.  Privacy-preserving data mining in the malicious model , 2008, Int. J. Inf. Comput. Secur..

[23]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[24]  Ivan Damgård,et al.  A generalization of Paillier’s public-key system with applications to electronic voting , 2010, International Journal of Information Security.

[25]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[26]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[27]  Chris Clifton,et al.  Transforming Semi-Honest Protocols to Ensure Accountability , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[28]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[29]  Chris Clifton,et al.  Privacy-preserving clustering with distributed EM mixture modeling , 2004, Knowledge and Information Systems.

[30]  Joan Feigenbaum,et al.  Secure Multiparty Computation of Approximations , 2001, ICALP.