A new scheme on privacy-preserving data classification

We address privacy-preserving classification problem in a distributed system. Randomization has been the approach proposed to preserve privacy in such scenario. However, this approach is now proven to be insecure as it has been discovered that some privacy intrusion techniques can be used to reconstruct private information from the randomized data tuples. We introduce an algebraic-technique-based scheme. Compared to the randomization approach, our new scheme can build classifiers more accurately but disclose less private information. Furthermore, our new scheme can be readily integrated as a middleware with existing systems.

[1]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[2]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[3]  Mark S. Ackerman,et al.  Beyond Concern: Understanding Net Users' Attitudes About Online Privacy , 1999, ArXiv.

[4]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[5]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[6]  Lei Liu,et al.  Optimal randomization for privacy preserving data mining , 2004, KDD.

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Jayant R. Haritsa,et al.  On Addressing Efficiency Concerns in Privacy-Preserving Mining , 2003, DASFAA.

[9]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[10]  Wei Zhao,et al.  Distributed Privacy Preserving Information Sharing , 2005, VLDB.

[11]  Melinda Miller Holt,et al.  Statistics and Data Analysis From Elementary to Intermediate , 2001, Technometrics.

[12]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[13]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[16]  Gene H. Golub,et al.  Matrix computations , 1983 .

[17]  A. Meyer The Health Insurance Portability and Accountability Act. , 1997, Tennessee medicine : journal of the Tennessee Medical Association.

[18]  Wenliang Du,et al.  Using randomized response techniques for privacy-preserving data mining , 2003, KDD '03.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[21]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[22]  Wei Zhao,et al.  A New Scheme on Privacy Preserving Association Rule Mining , 2004, PKDD.