PMA for Privacy Preservation in Data Mining

Privacy is becoming a progressively important issue in many data mining applications. This has initiated the development of many privacy preserving data mining techniques. In recent years, various data mining algorithms combining privacy preserving techniques have been established that hide sensitive identifiers or patterns. When applying privacy preservation techniques, importance is given to the utility and information loss. In this paper we propose Statistical Disclosure Control (SDC) based Perturbed Micro Aggregation (PMA) for anonymizing the individual records. Through the experimental results, the proposed technique is validated to prevent the disclosure of sensitive data without degradation of data utilization. Our work highlights some discussions about future work and promising directions in the perspective of privacy preservation in data mining. Keywords—PPDM; privacy; microaggregation; microdata; anonymization; data mining

[1]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[2]  Chris J. Skinner,et al.  Statistical disclosure control for survey data , 2009 .

[3]  Md. Enamul Kabir,et al.  Microdata Protection Method Through Microaggregation: A Median-Based Approach , 2011, Inf. Secur. J. A Glob. Perspect..

[4]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[5]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[6]  Josep Domingo-Ferrer,et al.  Hybrid microdata using microaggregation , 2010, Inf. Sci..

[7]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[8]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Josep Domingo-Ferrer,et al.  t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[10]  Marina Blanton,et al.  Secure Multiparty Computation , 2011, Encyclopedia of Cryptography and Security.

[11]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[12]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  Elisa Bertino,et al.  A Survey of Quantification of Privacy Preserving Data Mining Algorithms , 2008, Privacy-Preserving Data Mining.

[16]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[17]  J. Domingo-Ferrer,et al.  A COMPARATIVE STUDY OF MICROAGGREGATION METHODS , 1998 .

[18]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[19]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[20]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[21]  Keke Chen,et al.  Towards Attack-Resilient Geometric Data Perturbation , 2007, SDM.

[22]  Josep Domingo-Ferrer,et al.  Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond , 2008, PAIS '08.

[23]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[24]  Chong K. Liew,et al.  A data distortion by probability distribution , 1985, TODS.

[25]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[26]  G. Arumugam,et al.  IMR based Anonymization for Privacy Preservation in Data Mining , 2016, KMO.