A novel data distortion approach via selective SSVD for privacy protection

Data privacy preservation has become one of the major concerns in the design of practical data-mining applications. In this paper, a novel data distortion approach based on structural partition and Sparsified Singular Value Decomposition (SSVD) technique is proposed. Three schemes are designed to balance privacy protection in centralised datasets and mining accuracy. Some metrics are used to evaluate the performance of the proposed new strategies. Data utility of the three proposed schemes is examined by a binary classification based on the support vector machine. Furthermore, we examine three sparsification strategies. The effect of method parameters on data distortion level and utility is also studied experimentally. Our experimental results on synthetic and real datasets indicate that, in comparison with standard data distortion techniques, the proposed schemes are efficient in balancing data distortion level and data utility. They afford a feasible solution with a good promise for mining accuracy and a significant reduction in the computational cost from SVD.

[1]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[2]  Wenliang Du,et al.  SVD-based collaborative filtering with privacy , 2005, SAC '05.

[3]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[4]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[5]  Jing Gao,et al.  Clustered SVD strategies in latent semantic indexing , 2005, Inf. Process. Manag..

[7]  Michael K. Reiter,et al.  Crowds: anonymity for Web transactions , 1998, TSEC.

[8]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[9]  Rathindra Sarathy,et al.  An Improved Security Requirement for Data Perturbation with Implications for E-Commerce , 2001, Decis. Sci..

[10]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[11]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[12]  Jie Wang,et al.  Data Distortion for Privacy Protection in a Terrorist Analysis System , 2005, ISI.

[13]  Larry Korba,et al.  Privacy in distributed electronic commerce , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Sheng Zhong,et al.  Privacy-Preserving Classification of Customer Data without Loss of Accuracy , 2005, SDM.

[16]  Jie Wang,et al.  Knowledge and Information Systems REGULAR PAPER , 2006 .

[17]  Hsinchun Chen,et al.  Intelligence and security informatics for homeland security: information, communication, and transportation , 2004, IEEE Transactions on Intelligent Transportation Systems.

[18]  Lorrie Faith Cranor,et al.  Internet privacy , 1999, CACM.

[19]  Elisa Bertino,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms* , 2005, Data Mining and Knowledge Discovery.

[20]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[21]  Tatjana Welzer,et al.  Protecting Medical Data for Decision-Making Analyses , 2005, Journal of Medical Systems.

[22]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[23]  Jun Zhang,et al.  Sparsification Strategies in Latent Semantic Indexing , 2003 .

[24]  Rathindra Sarathy,et al.  An Enhanced Data Perturbation Approach for Small Data Sets , 2005, Decis. Sci..

[25]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[26]  Hillol Kargupta,et al.  Homeland Defense, Privacy-Sensitive Data Mining, and Random Value Distortion , 2003 .

[27]  Rathindra Sarathy,et al.  Security of random data perturbation methods , 1999, TODS.

[28]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[29]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .