Rethinking rank swapping to decrease disclosure risk

Nowadays, the need for privacy motivates the use of methods that allow to protect a microdata file both minimizing the disclosure risk and preserving the data utility. A very popular microdata protection method is rank swapping. Record linkage is the standard mechanism used to measure the disclosure risk of a microdata protection method. In this paper we present a new record linkage method, specific for rank swapping, which obtains more links than standard ones. The consequence is that rank swapping has a higher disclosure risk than believed up to now. Motivated by this, we present two new variants of the rank swapping method, which make the new record linkage technique unsuitable. Therefore, the real disclosure risk of these new methods is lower than the standard rank swapping.

[1]  Josep Domingo-Ferrer,et al.  Record linkage methods for multidatabase data mining , 2003 .

[2]  Vicenc Torra,et al.  Information Fusion in Data Mining , 2003 .

[3]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[4]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[5]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[6]  Anco Hundepool Computational aspects of statistical confidentiality the CASC-project , 2001 .

[7]  Jim Burridge,et al.  Information preserving statistical obfuscation , 2003, Stat. Comput..

[8]  Elisa Bertino,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms* , 2005, Data Mining and Knowledge Discovery.

[9]  Josep Domingo-Ferrer,et al.  Post-Masking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets , 2002, Inference Control in Statistical Databases.

[10]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[11]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[12]  Sadaaki Miyamoto,et al.  Evaluating Fuzzy Clustering Algorithms for Microdata Protection , 2004, Privacy in Statistical Databases.

[13]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[14]  Josep Domingo-Ferrer,et al.  Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment , 2006, Privacy in Statistical Databases.

[15]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[16]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[17]  Josep Domingo-Ferrer,et al.  Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata , 2005, Data Mining and Knowledge Discovery.

[18]  Mark S. Ackerman,et al.  Privacy in e-commerce: examining user scenarios and privacy preferences , 1999, EC '99.

[19]  Tsan-sheng Hsu,et al.  An epistemic framework for privacy protection in database linking , 2007, Data Knowl. Eng..

[20]  William E. Winkler,et al.  Re-identification Methods for Masked Microdata , 2004, Privacy in Statistical Databases.

[21]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .