Privacy-preserving multi-party decision tree induction

Data mining is a process to extract useful knowledge from large amounts of data. To conduct data mining, we often need to collect data. However, sometimes the data are distributed among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties can collaboratively conduct data mining without breaching data privacy presents a grand challenge. In this paper, we propose a randomisation-based scheme for multi-parties to conduct data mining computations without disclosing their actual data sets to each other.

[1]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[2]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[3]  Hyoil Han,et al.  Temporal rule induction for clinical outcome analysis , 2005, Int. J. Bus. Intell. Data Min..

[4]  Stan Matwin,et al.  Privacy-Preserving Multi-Party Decision Tree Induction , 2007, DBSec.

[5]  Justin Zhijun Zhan,et al.  Privacy-preserving collaborative data mining , 2007, IEEE Computational Intelligence Magazine.

[6]  Kate Smith-Miles,et al.  Kernal Width Selection for SVM Classification: A Meta-Learning Approach , 2005, Int. J. Data Warehous. Min..

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[9]  Doheon Lee,et al.  Improving Classification Accuracy of Decision Trees for Different Abstraction Levels of Data , 2005, Int. J. Data Warehous. Min..

[10]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[11]  Sikha Bagui,et al.  An Approach to Mining Crime Patterns , 2006, Int. J. Data Warehous. Min..

[12]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[13]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[14]  W. B. Barksdale New randomized response techniques for control of non-sampling errors in surveys , 1971 .

[15]  A. Tamhane Randomized Response Techniques for Multiple Sensitive Attributes , 1981 .

[16]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[17]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[18]  Salvatore J. Stolfo,et al.  An extensible meta-learning approach for scalable and accurate inductive learning , 1996 .

[19]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[20]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[21]  Xiaohua Hu,et al.  Mining novel connections from online biomedical text databases using semantic query expansion and semantic-relationship pruning , 2005, Int. J. Web Grid Serv..

[22]  Salvatore J. Stolfo,et al.  On the Accuracy of Meta-learning for Scalable Data Mining , 2004, Journal of Intelligent Information Systems.

[23]  Shafi Goldwasser,et al.  Multi party computations: past and present , 1997, PODC '97.

[24]  Kate Smith-Miles,et al.  Maximum-entropy estimated distribution model for classification problems , 2006, Int. J. Hybrid Intell. Syst..

[25]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.