Evaluating Re-Identification Risks of Data Protected by Additive Data Perturbation

Commercial organizations and government agencies that gather, store, share and disseminate data are facing increasing concerns over individual privacy and confidentiality. Confidential data is often masked in the database or prior to release to a third party, through methods such as data perturbation. In this study, re-identification risks of three major additive data perturbation techniques were compared using two different record linkage techniques. The results suggest that re-identification risk of Kim's multivariate noise addition method is similar to that of simple noise addition method. The general additive perturbation method (GADP) has the lowest re-identification risk and therefore provides the highest level of protection. The study also suggests that Fuller's method of assessing re-identification risk may be better suited than the probabilistic record-linkage method of Winkler, for numeric data. The results of this study should be help organizations and government agencies choose an appropriate additive perturbation technique.

[1]  Christian Neumann,et al.  Exploring the Effects of Process Characteristics on Products Quality in Open Source Software Development , 2008, J. Database Manag..

[2]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[3]  Nigel Melville,et al.  Research Note - Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation , 2012, Inf. Syst. Res..

[4]  Rathindra Sarathy,et al.  A theoretical basis for perturbation methods , 2003, Stat. Comput..

[5]  Rathindra Sarathy,et al.  Statistical Dependence as the Basis for a Privacy Measure for Microdata Release , 2012, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Rathindra Sarathy,et al.  The Security of Confidential Numerical Data in Databases , 2002, Inf. Syst. Res..

[7]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[8]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[9]  J B Copas,et al.  Record linkage: statistical models for matching computer records. , 1990, Journal of the Royal Statistical Society. Series A,.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  William E. Winkler 20. Matching and Record Linkage , 2011 .

[12]  R. Burkard,et al.  Assignment and Matching Problems: Solution Methods with FORTRAN-Programs , 1980 .

[13]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[14]  Rathindra Sarathy,et al.  Perturbing Nonnormal Confidential Attributes: The Copula Approach , 2002, Manag. Sci..

[15]  Henryk Wozniakowski,et al.  The statistical security of a statistical database , 1984, TODS.

[16]  Rathindra Sarathy,et al.  Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data , 2011, Trans. Data Priv..

[17]  P. Tendick Optimal noise addition for preserving confidentiality in multivariate data , 1991 .

[18]  Rathindra Sarathy,et al.  An Improved Security Requirement for Data Perturbation with Implications for E-Commerce , 2001, Decis. Sci..

[19]  Yangjun Chen Path-Oriented Queries and Tree Inclusion Problems , 2005, Encyclopedia of Database Technologies and Applications.

[20]  Kevin C. Desouza,et al.  Healthcare Information: From Administrative to Practice Databases , 2003 .

[21]  R. Sarathy,et al.  Fool's Gold: an Illustrated Critique of Differential Privacy , 2013 .

[22]  Iris Reinhartz-Berger,et al.  Semi-Automatic Composition of Situational Methods , 2011, J. Database Manag..