Privacy preserving classification by using modified C4.5

Protecting the datasets supplied to third parties for data mining purposes is essential so that these datasets cannot be used for secondary purposes. C4.5 is a classification algorithm which works on mixed datasets. Data perturbation is an important technique in data privacy. This paper proposes a modified C4.5 which uses perturbed and unrealized datasets for classification. The decision tree is built by using the gain ratio as the split criteria and it is computed using the unreal and perturbed datasets. Experimental results are obtained by simulation in Weka.

[1]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[2]  Jens H. Weber,et al.  Privacy Preserving Decision Tree Learning Using Unrealized Data Sets , 2012, IEEE Transactions on Knowledge and Data Engineering.

[3]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[4]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[5]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .