Evaluating the Classification Accuracy of Data Mining Algorithms for Anonymized Data

Recent advances in hardware technology have increased storage and recording capability with regard to personal data on individuals. This has created fears that such data could be misused. To alleviate such concerns, data was anonymized and many techniques were recently proposed on performing data mining tasks in ways which ensured privacy. Anonymization techniques were drawn from a variety of related topics like data mining, cryptography and information hiding. Data is anoymized through methods like randomization, k-anonymous, l-diversity. Several privacy preserving data mining algorithms are available in literature. This paper investigates the classification accuracy of the data with and without k-anonymization to compare the efficiency of privacy preserving mining. The classification accuracy is evaluated using k nearest neighbor, J48 and Bagging.

[1]  Chong K. Liew,et al.  A data distortion by probability distribution , 1985, TODS.

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[6]  Bradley Malin,et al.  Preserving privacy by de-identifying face images , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[9]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[10]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[11]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[14]  Yücel Saygin,et al.  Secure Association Rule Sharing , 2004, PAKDD.

[15]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[19]  Elisa Bertino,et al.  Using Anonymized Data for Classification , 2009, 2009 IEEE 25th International Conference on Data Engineering.