Computational Improvements in Parallelized K-Anonymous Microaggregation of Large Databases

The technical contents of this paper fall within the field of statistical disclosure control (SDC), which concerns the postprocessing of the demographic portion of the statistical results of surveys containing sensitive personal information, in order to effectively safeguard the anonymity of the participating respondents. The concrete purpose of this study is to improve the efficiency of a widely used algorithm for k-anonymous microaggregation, known as maximum distance to average vector (MDAV), to vastly accelerate its execution without affecting its excellent functional performance with respect to competing methods. The improvements put forth in this paper encompass algebraic modifications and the use of the basic linear algebra subprograms (BLAS) library, for the efficient parallel computation of MDAV on CPU.

[1]  Jordi Forné,et al.  A modification of the Lloyd algorithm for k-anonymous quantization , 2013, Inf. Sci..

[2]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[3]  Jordi Forné,et al.  Measuring the privacy of user profiles in personalized information systems , 2014, Future Gener. Comput. Syst..

[4]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[5]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[6]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  Joel Max,et al.  Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[9]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[10]  Jorge J. Moré,et al.  The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .

[11]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[12]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[13]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[14]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[15]  Jordi Forné,et al.  Optimized Query Forgery for Private Information Retrieval , 2010, IEEE Transactions on Information Theory.

[16]  Josep Domingo-Ferrer,et al.  Hybrid microdata using microaggregation , 2010, Inf. Sci..

[17]  Jordi Forné,et al.  An algorithm for k-anonymous microaggregation and clustering inspired by the design of distortion-optimized quantizers , 2011, Data Knowl. Eng..

[18]  Lior Rokach,et al.  Privacy-preserving data mining: A feature set partitioning approach , 2010, Inf. Sci..

[19]  Javier Herranz,et al.  On the disclosure risk of multivariate microaggregation , 2008, Data Knowl. Eng..

[20]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[21]  Jordi Forné,et al.  Private location-based information retrieval through user collaboration , 2010, Comput. Commun..