Privacy preserving decision tree learning over multiple parties

Data mining over multiple data sources has emerged as an important practical problem with applications in different areas such as data streams, data-warehouses, and bioinformatics. Although the data sources are willing to run data mining algorithms in these cases, they do not want to reveal any extra information about their data to other sources due to legal or competition concerns. One possible solution to this problem is to use cryptographic methods. However, the computation and communication complexity of such solutions render them impractical when a large number of data sources are involved. In this paper, we consider a scenario where multiple data sources are willing to run data mining algorithms over the union of their data as long as each data source is guaranteed that its information that does not pertain to another data source will not be revealed. We focus on the classification problem in particular and present an efficient algorithm for building a decision tree over an arbitrary number of distributed sources in a privacy preserving manner using the ID3 algorithm.

[1]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[2]  Ramakrishnan Srikant,et al.  Implementing P3P using database technology , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[3]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[4]  Yuval Ishai,et al.  Protecting data privacy in private information retrieval schemes , 1998, STOC '98.

[5]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[6]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Niv Gilboa,et al.  Computationally private information retrieval (extended abstract) , 1997, STOC '97.

[8]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[9]  Thomas G. Dietterich,et al.  Readings in Machine Learning , 1991 .

[10]  Divyakant Agrawal,et al.  Privacy Preserving Query Processing Using Third Parties , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[13]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[14]  H. Garcia-Molina,et al.  Enabling Privacy for the Paranoids , 2004 .

[15]  Tal Rabin,et al.  Verifiable secret sharing and multiparty protocols with honest majority , 1989, STOC '89.

[16]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[17]  Benny Pinkas,et al.  Secure Computation of the k th-Ranked Element , 2004, EUROCRYPT.

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  Jaideep Vaidya,et al.  Privacy-preserving indexing of documents on the network , 2003, The VLDB Journal.

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[23]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[24]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[25]  Radu Sion,et al.  Resilient Rights Protection for Sensor Streams , 2004, VLDB.

[26]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[27]  Radu Sion,et al.  Rights protection for relational data , 2003, IEEE Transactions on Knowledge and Data Engineering.

[28]  Silvio Micali,et al.  Computationally Private Information Retrieval with Polylogarithmic Communication , 1999, EUROCRYPT.

[29]  Gene Tsudik,et al.  A Privacy-Preserving Index for Range Queries , 2004, VLDB.

[30]  Peter J. Haas,et al.  A system for watermarking relational databases , 2003, SIGMOD '03.

[31]  Ramakrishnan Srikant,et al.  Hippocratic Databases , 2002, VLDB.

[32]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[33]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[34]  Eyal Kushilevitz,et al.  Private information retrieval , 1998, JACM.

[35]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[36]  Yuval Ishai,et al.  Protecting data privacy in private information retrieval schemes , 1998, STOC '98.

[37]  Jennifer Widom,et al.  Vision Paper: Enabling Privacy for the Paranoids , 2004, VLDB.