Analysis of the Univariate Microaggregation Disclosure Risk

Microaggregation is a protection method used by statistical agencies to limit the disclosure risk of confidential information. Formally, microaggregation assigns each original datum to a small cluster and then replaces the original data with the centroid of such cluster. As clusters contain at least k records, microaggregation can be considered as preserving k-anonymity. Nevertheless, this is only so when multivariate microaggregation is applied and, moreover, when all variables are microaggregated at the same time.When different variables are protected using univariate microaggregation, k-anonymity is only ensured at the variable level. Therefore, the real k-anonymity decreases for most of the records and it is then possible to cause a leakage of privacy. Due to this, the analysis of the disclosure risk is still meaningful in microaggregation.This paper proposes a new record linkage method for univariate microaggregation based on finding the optimal alignment between the original and the protected sorted variables. We show that our method, which uses a DTW distance to compute the optimal alignment, provides the intruder with enough information in many cases to to decide if the link is correct or not. Note that, standard record linkage methods never ensure the correctness of the linkage. Furthermore, we present some experiments using two well-known data sets, which show that our method has better results (larger number of correct links) than the best standard record linkage method.

[1]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[2]  Vicenç Torra,et al.  Constrained Microaggregation: Adding Constraints for Data Editing , 2008, Trans. Data Priv..

[3]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  M. Templ Statistical Disclosure Control for Microdata Using the R-Package sdcMicro , 2008, Trans. Data Priv..

[5]  Elisa Bertino,et al.  An Analysis Study on Zone-Based Anonymous Communication in Mobile Ad Hoc Networks , 2007, IEEE Transactions on Dependable and Secure Computing.

[6]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[7]  Pascal Heus,et al.  Data Access in a Cyber World: Making Use of Cyberinfrastructure , 2008, Trans. Data Priv..

[8]  Vicenç Torra,et al.  Record linkage for database integration using fuzzy integrals , 2008 .

[9]  Jordi Nin Guerrero,et al.  Record linkage for database integration using fuzzy integrals , 2008, Int. J. Intell. Syst..

[10]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[11]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Javier Herranz,et al.  Rethinking rank swapping to decrease disclosure risk , 2008, Data Knowl. Eng..

[13]  Paolo Ciaccia,et al.  Efficiently and Accurately Comparing Real-valued Data Streams , 2005, SEBD.

[14]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[15]  Javier Herranz,et al.  On the disclosure risk of multivariate microaggregation , 2008, Data Knowl. Eng..

[16]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[17]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[18]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[19]  Sadaaki Miyamoto,et al.  Evaluating Fuzzy Clustering Algorithms for Microdata Protection , 2004, Privacy in Statistical Databases.

[20]  L. R. Rabiner,et al.  A comparative study of several dynamic time-warping algorithms for connected-word recognition , 1981, The Bell System Technical Journal.

[21]  Josep Domingo-Ferrer,et al.  Selecting potentially relevant records using re-identification methods , 2009, New Generation Computing.

[22]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Xiaoxin Wu,et al.  Achieving K-anonymity in mobile ad hoc networks , 2005, 1st IEEE ICNP Workshop on Secure Network Protocols, 2005. (NPSec)..

[24]  U. Rovira,et al.  Chapter 6 A Quantitative Comparison of Disclosure Control Methods for Microdata , 2001 .

[25]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[26]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[27]  Josep Domingo-Ferrer,et al.  Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment , 2006, Privacy in Statistical Databases.

[28]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[29]  Josep Domingo-Ferrer,et al.  Record linkage methods for multidatabase data mining , 2003 .

[30]  Javier Herranz,et al.  How to Group Attributes in Multivariate Microaggregation , 2008, Int. J. Uncertain. Fuzziness Knowl. Based Syst..