Privacy-Preserving Data Mining Based on Sample Selection and Singular Value Decomposition

For improving the PPDM (privacy-preserving data mining) methods based on matrix decomposition, this paper proposed a new PPDM method both using sample selection and matrix decomposition. The original matrix decomposition-based methods perform attribute extraction by matrix decomposition to analyze data, find the important information for data mining and remove the unimportant information to perturb data. In addition to attribute extraction, sample selection also can analyze data. If both sample selection and matrix decompositions are used, the important information for data mining should be found more accurately, which is the basic idea of this proposed new method. The experiments showed that this new method can perform better in privacy preserving than the methods using matrix decompositions alone, while keeping data utility.

[1]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[2]  Ian Witten,et al.  Data Mining , 2000 .

[3]  Jie Wang,et al.  Knowledge and Information Systems REGULAR PAPER , 2006 .

[4]  Elisa Bertino,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms* , 2005, Data Mining and Knowledge Discovery.

[5]  Jie Wang,et al.  A novel data distortion approach via selective SSVD for privacy protection , 2008, Int. J. Inf. Comput. Secur..

[6]  Slava Kisilevich,et al.  Efficient Multidimensional Suppression for K-Anonymity , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[8]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[9]  Hao Hong Training Sample Selection Method for Neural Networks Based on Nearest Neighbor Rule , 2007 .

[10]  Li Liu,et al.  The applicability of the perturbation based privacy preserving data mining for real-world data , 2008, Data Knowl. Eng..

[11]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[12]  Herman T. Tavani,et al.  Informational privacy, data mining, and the Internet , 1998, Ethics and Information Technology.

[13]  Jie Wang,et al.  Wavelet-Based Data Perturbation for Simultaneous Privacy-Preserving and Statistics-Preserving , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[14]  Jie Wang,et al.  NNMF-Based Factorization Techniques for High-Accuracy Privacy Protection on Non-negative-valued Datasets , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[15]  Divyakant Agrawal,et al.  Privacy preserving decision tree learning over multiple parties , 2007, Data Knowl. Eng..

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .