A comparison of multiple imputation and data perturbation for masking numerical variables

Statistical disclosure limitation techniques are designed to provide legitimate users with access to useful data while simultaneously preventing disclosure of sensitive information. Two techniques that can be used to limit disclosure of sensitive numerical data are multiple imputation and data perturbation. While many studies have addressed the effectiveness of perturbation and multiple imputation individually, no studies have directly compared the two techniques. In this study, we compare the effectiveness of multiple imputation and data perturbation for numerical microdata. The results indicate that, in the absence of missing data, data perturbation performs better than multiple imputation. In addition, since only a single perturbed data set is released (unlike the multiply-imputed data sets that are released), data perturbation eases the burden on users of such data.

[1]  Jim Burridge,et al.  Information preserving statistical obfuscation , 2003, Stat. Comput..

[2]  Rathindra Sarathy,et al.  A theoretical basis for perturbation methods , 2003, Stat. Comput..

[3]  Jerome P. Reiter,et al.  Multiple Imputation for Statistical Disclosure Limitation , 2003 .

[4]  Rathindra Sarathy,et al.  Perturbing Nonnormal Confidential Attributes: The Copula Approach , 2002, Manag. Sci..

[5]  Jerome P. Reiter,et al.  Satisfying Disclosure Restrictions With Synthetic Data Sets , 2002 .

[6]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[7]  William E. Winkler Single-Ranking Micro-aggregation and Re-identification , 2002 .

[8]  Rathindra Sarathy,et al.  An Improved Security Requirement for Data Perturbation with Implications for E-Commerce , 2001, Decis. Sci..

[9]  Rathindra Sarathy,et al.  ISSUES IN PERTURBING NON-NORMAL, CONFIDENTIAL ATTRIBUTES , 2001 .

[10]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[11]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[12]  R. Clemen,et al.  Correlations and Copulas for Decision and Risk Analysis , 1999 .

[13]  Stephen E. Fienberg,et al.  Disclosure limitation using perturbation and related methods for categorical data , 1998 .

[14]  Norman S. Matloff,et al.  A modified random perturbation method for database security , 1994, TODS.

[15]  J. Norwood [Enhancing Access to Microdata While Protecting Confidentiality: Prospects for the Future]: Comment , 1991 .

[16]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .

[17]  P. Tendick Optimal noise addition for preserving confidentiality in multivariate data , 1991 .

[18]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[19]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[20]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .