Generic CBTS: Correlation based Transformation Strategy for Privacy Preserving Data Mining

Mining useful knowledge from corpus of data has become an important application in many fields. Data Mining algorithms like Clustering, Classification work on this data and provide crisp information for analysis. As these data are available through various channels into public domain, privacy for the owners of the data is increasing need. Though privacy can be provided by hiding sensitive data, it will affect the Data Mining algorithms in knowledge extraction, so an effective mechanism is required to provide privacy to the data and at the same time without affecting the Data Mining results. Privacy concern is a primary hindrance for quality data analysis. Data mining algorithms on the contrary focus on the mathematical nature than on the private nature of the information. Therefore instead of removing or encrypting sensitive data, we propose transformation strategies that retain the statistical, semantic and heuristic nature of the data while masking the sensitive information. The proposed Correlation Based Transformation Strategy (CBTS) combines Correlation Analysis in tandem with data transformation techniques such as Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Non Negative Matrix Factorization (NNMF) provides the intended level of privacy preservation and enables data analysis. The proposed technique will work for numerical, ordinal and nominal data. The outcome of CBTS is evaluated on standard datasets against popular data mining techniques with significant success and Information Entropy is also accounted.

[1]  Klaus Mueller,et al.  Ieee Transactions on Visualization and Computer Graphics 1 Visual Correlation Analysis of Numerical and Categorical Data on the Correlation Map , 2022 .

[2]  Samir Patel,et al.  Privacy Preserving Based on PCA Transformation Using Data Perturbation Technique , 2013 .

[3]  Jie Wang,et al.  Knowledge and Information Systems REGULAR PAPER , 2006 .

[4]  Rajarshi Shahu,et al.  K-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data , 2016 .

[5]  Arun Kumar Misra,et al.  An Improved Approach to High Level Privacy Preserving Itemset Mining , 2010, ArXiv.

[6]  Beatriz de la Iglesia,et al.  Privacy-Preserving SVM Classification using Non-metric MDS , 2013 .

[7]  Hui Gao,et al.  Personalized Privacy-Preserving Frequent Itemset Mining Using Randomized Response , 2014, TheScientificWorldJournal.

[8]  Ling Guo,et al.  Randomization Based Privacy Preserving Categorical Data Analysis. , 2010 .

[9]  Jens H. Weber,et al.  Privacy Preserving Decision Tree Learning Using Unrealized Data Sets , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Liang Hu,et al.  Using Noise Addition Method Based on Pre-mining to Protect Healthcare Privacy , 2012 .

[11]  Jie Wang,et al.  Selective Data Distortion via Structural Partition and SSVD for Privacy Preservation , 2006, IKE.

[12]  Zekeriya Erkin,et al.  Privacy-preserving distributed clustering , 2013, EURASIP J. Inf. Secur..

[13]  Eui-nam Huh,et al.  Shear-Based Spatial Transformation to Protect Proximity Attack in Outsourced Database , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[14]  S. Vijayarani,et al.  An efficient masking technique for sensitive data protection , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[15]  Aziz Mohaisen,et al.  Augmented Rotation‐Based Transformation for Privacy‐Preserving Data Clustering , 2010, ArXiv.

[16]  Tianqing Zhu,et al.  Correlated Differential Privacy: Hiding Information in Non-IID Data Set , 2015, IEEE Transactions on Information Forensics and Security.

[17]  Jianfeng Ma,et al.  Privacy-Preserving Patient-Centric Clinical Decision Support System on Naïve Bayesian Classification , 2016, IEEE Journal of Biomedical and Health Informatics.

[18]  Hong Shen,et al.  Effective Reconstruction of Data Perturbed by Random Projections , 2012, IEEE Transactions on Computers.