Post Anonymization Techniques in Privacy Preserved Data Mining

Privacy preserving data mining deals with the effectiveness of preserving privacy and utility of the data. Privacy becomes a key concern when the medical data is published for research purposes. Anonymization techniques can be used to transform the dataset into less specific values before publishing to overcome the security breaches. Privacy preservation may reduce the utility value of data. Classification helps to improve the utility of the anonymized data. We propose a model in which a multi-decision tree classifier is built on the anonymized dataset to improve the utility. Multi-decision tree classifier is constituted by Improved ID3 based ADABOOST classifier. The proposed approach is different as the decision tree built is multi-decision tree and as it is constructed on the anonymized dataset. It is proved to be better than the pure decision tree classifier as the multi-decision tree classifier has accuracy better than and training duration shorter than the normal ID3 based ADABOOST classifier.

[1]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[2]  Reza Ebrahimpour,et al.  Combining Multiple Classifiers: Diversify with Boosting and Combining by Stacking , 2007 .

[3]  Carolyn Pillers Dobler,et al.  Mathematical Statistics , 2002 .

[4]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[5]  Elisa Bertino,et al.  Using Anonymized Data for Classification , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[7]  Lin Peng,et al.  Study on K-anonymity Models of Sharing Medical Information , 2007, 2007 International Conference on Service Systems and Service Management.

[8]  Hong Xue,et al.  Multi-decision-tree classifier in Master Data Management System , 2011, 2011 International Conference on Business Management and Electronic Information.

[9]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[10]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Vasant Honavar,et al.  Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data , 2006, Knowledge and Information Systems.

[13]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Raymond Chi-Wing Wong,et al.  Information based data anonymization for classification utility , 2011, Data Knowl. Eng..