Privacy Preserving Data Mining

In this paper we introduce the concept of privacy preserving data mining. In our model, two parties owning confidential databases wish to run a data mining algorithm on the union of their databases, without revealing any unnecessary information. This problem has many practical and important applications, such as in medical research with confidential patient records. Data mining algorithms are usually complex, especially as the size of the input is measured in megabytes, if not gigabytes. A generic secure multi-party computation solution, based on evaluation of a circuit computing the algorithm on the entire input, is therefore of no practical use. We focus on the problem of decision tree learning and use ID3, a popular and widely used algorithm for this problem. We present a solution that is considerably more efficient than generic solutions. It demands very few rounds of communication and reasonable bandwidth. In our solution, each party performs by itself a computation of the same order as computing the ID3 algorithm for its own database. The results are then combined using efficient cryptographic protocols, whose overhead is only logarithmic in the number of transactions in the databases. We feel that our result is a substantial contribution, demonstrating that secure multi-party computation can be made practical, even for complex problems and large inputs.

[1]  Oded Goldreich,et al.  A randomized protocol for signing contracts , 1985, CACM.

[2]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[3]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[4]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[5]  David Chaum,et al.  Multiparty unconditionally secure protocols , 1988, STOC '88.

[6]  Silvio Micali,et al.  Non-Interactive Oblivious Transfer and Applications , 1989, CRYPTO.

[7]  Joe Kilian,et al.  Uses of randomness in algorithms and protocols , 1990 .

[8]  Eyal Kushilevitz,et al.  Private information retrieval , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9]  Peter Winkler,et al.  Comparing information without leaking it , 1996, CACM.

[10]  Matthew K. Franklin,et al.  Efficient Generation of Shared RSA Keys (Extended Abstract) , 1997, CRYPTO.

[11]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[12]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[13]  Niv Gilboa,et al.  Two Party RSA Key Generation , 1999, CRYPTO.

[14]  Moni Naor,et al.  Distributed Oblivious Transfer , 2000, ASIACRYPT.

[15]  Ran Canetti,et al.  Security and Composition of Multiparty Cryptographic Protocols , 2000, Journal of Cryptology.

[16]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[17]  Matthew K. Franklin,et al.  Efficient generation of shared RSA keys , 2001, JACM.

[18]  Joan Feigenbaum,et al.  Secure Multiparty Computation of Approximations , 2001, ICALP.