Efficient k-anonymous microaggregation of multivariate numerical data via principal component analysis

Abstract k -Anonymous microaggregation is a widespread technique to address the problem of protecting the privacy of the respondents involved beyond the mere suppression of their identifiers, in applications where preserving the utility of the information disclosed is critical. Unfortunately, microaggregation methods with high data utility may impose stringent computational demands when dealing with datasets containing a large number of records and attributes. This work proposes and analyzes various anonymization methods which draw upon the algebraic-statistical technique of principal component analysis (PCA), in order to effective reduce the number of attributes processed, that is, the dimension of the multivariate microaggregation problem at hand. By preserving to a high degree the energy of the numerical dataset and carefully choosing the number of dominant components to process, we manage to achieve remarkable reductions in running time and memory usage with negligible impact in information utility. Our methods are readily applicable to high-utility SDC of large-scale datasets with numerical demographic attributes. © 2019 The Authors. Preprint submitted to Elsevier, Inc.

[1]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[2]  Jordi Nin,et al.  Efficient microaggregation techniques for large numerical data volumes , 2012, International Journal of Information Security.

[3]  Jordi Forné,et al.  Private Location-Based Information Retrieval via k-Anonymous Clustering , 2010 .

[4]  Chris Clifton,et al.  On syntactic anonymity and differential privacy , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[5]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[6]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[7]  Jordi Forné,et al.  Computational Improvements in Parallelized K-Anonymous Microaggregation of Large Databases , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[8]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[9]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[11]  Jordi Forné,et al.  k-Anonymous microaggregation with preservation of statistical dependence , 2016, Inf. Sci..

[12]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jordi Forné,et al.  Incremental $k$ -Anonymous Microaggregation in Large-Scale Electronic Surveys With Optimized Scheduling , 2018, IEEE Access.

[14]  Jordi Forné,et al.  A modification of the Lloyd algorithm for k-anonymous quantization , 2013, Inf. Sci..

[15]  Jordi Forné,et al.  p-Probabilistic k-anonymous microaggregation for the anonymization of surveys with uncertain participation , 2017, Inf. Sci..

[16]  Joel Max,et al.  Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[17]  Elisa Bertino,et al.  Secure Anonymization for Incremental Datasets , 2006, Secure Data Management.

[18]  Jorge J. Moré,et al.  The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[19]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[20]  Mohamad A. Akra,et al.  On the Solution of Linear Recurrence Equations , 1998 .

[21]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[22]  Yan Sun,et al.  Toward inference attacks for k-anonymity , 2014, Personal and Ubiquitous Computing.

[23]  Josep Domingo-Ferrer,et al.  Hybrid microdata using microaggregation , 2010, Inf. Sci..

[24]  J. Domingo-Ferrer,et al.  A COMPARATIVE STUDY OF MICROAGGREGATION METHODS , 1998 .

[25]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[26]  Calviño Aida A Simple Method for Limiting Disclosure in Continuous Microdata Based on Principal Component Analysis , 2017 .

[27]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[28]  John Panaretos,et al.  Aspects of Estimation Procedures at Eurostat with Some Emphasis on Over-Space Harmonisation , 2001 .

[29]  M. Templ Statistical Disclosure Control for Microdata Using the R-Package sdcMicro , 2008, Trans. Data Priv..

[30]  Hua Wang,et al.  Enhanced P-Sensitive K-Anonymity Models for Privacy Preserving Data Publishing , 2008, Trans. Data Priv..

[31]  Josep Domingo-Ferrer,et al.  Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees , 2016, IEEE Transactions on Information Forensics and Security.

[32]  Jordi Forné,et al.  An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers , 2011, Data Knowl. Eng..

[33]  Lior Rokach,et al.  Privacy-preserving data mining: A feature set partitioning approach , 2010, Inf. Sci..

[34]  Javier Herranz,et al.  On the disclosure risk of multivariate microaggregation , 2008, Data Knowl. Eng..

[35]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[36]  Yu Hui-qun,et al.  An Improved V-MDAV Algorithm for l-Diversity , 2008, 2008 International Symposiums on Information Processing.

[37]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[38]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[39]  I. Jolliffe Principal Component Analysis , 2002 .

[40]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .