ANONYMIZATION BASED ON NESTED CLUSTERING FOR PRIVACY PRESERVATION IN DATA MINING

Privacy Preservation in data mining protects the data from revealing unauthorized extraction of information. Data Anonymization techniques implement this by modifying the data, so that the original values cannot be acquired easily. Perturbation techniques are variedly used which will greatly affect the quality of data, since there is a trade-off between privacy preservation and information loss which will subsequently affect the result of data mining. The method that is proposed in this paper is based on nested clustering of data and perturbation on each cluster. The size of clusters is kept optimal to reduce the information loss. The paper explains the methodology, implementation and results of nested clustering. Various metrics are also provided to explicate that this method overcomes the disadvantages of other perturbation methods.

[1]  Hong Shen,et al.  Effective Reconstruction of Data Perturbed by Random Projections , 2012, IEEE Transactions on Computers.

[2]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[3]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  Wenliang Du,et al.  Deriving private information from randomized data , 2005, SIGMOD '05.

[5]  Sarah M. Diesburg,et al.  A survey of confidential data storage and deletion methods , 2010, CSUR.

[6]  Ertem Tuncel,et al.  Incremental Maintenance of Online Summaries Over Multiple Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[7]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[8]  Ajay Challagalla,et al.  Privacy preservation in k-means clustering by cluster rotation , 2009, TENCON 2009 - 2009 IEEE Region 10 Conference.

[9]  Yücel Saygin,et al.  Anonymization of Longitudinal Electronic Medical Records , 2012, IEEE Transactions on Information Technology in Biomedicine.

[10]  B. Karthikeyan,et al.  A FUZZY BASED APPROACH FOR PRIVACY PRESERVING CLUSTERING , 2011 .

[11]  A. M. Natarajan,et al.  An Effective Data Transformation Approach for Privacy Preserving Clustering , 2008 .

[12]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[13]  Goldin and Senneby M and A , 2014 .

[14]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[15]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[16]  Yingjie Wu,et al.  Privacy Preserving k-Anonymity for Re-publication of Incremental Datasets , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[17]  Jens H. Weber,et al.  Privacy Preserving Decision Tree Learning Using Unrealized Data Sets , 2012, IEEE Transactions on Knowledge and Data Engineering.

[18]  Osmar R. Zaïane,et al.  A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration , 2007, Comput. Secur..