Protection of micro-data subject to edit constraints against Statistical Disclosure

Before releasing statistical outputs, data suppliers have to assess if the privacy of statistical units is endangered and apply Statistical Disclosure Control (SDC) methods if necessary. SDC methods perturb, modify or summarize the data, depending on the format for releasing the data, whether as micro-data or tabular data. The goal is to choose an optimal method that manages disclosure risk while ensuring high-quality statistical data. In this article we discuss the effect of applying basic SDC methods on continuous and categorical variables for data masking. Perturbative SDC methods alter the data in some way. Changing values, however, will likely distort totals and other sufficient statistics and also cause fully edited records in micro-data to fail edit constraints, resulting in low-quality data. Moreover, an inconsistent record will signal that the record has been perturbed for disclosure control and attempts can be made to unmask the data. In order to deal with these problems, we develop new strategies for implementing basic perturbation methods that are often implemented at Statistical Agencies which minimize record level edit failures as well as overall measures of information loss.

[1]  D. Holt,et al.  A Systematic Approach to Automatic Edit and Imputation , 1976 .

[2]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[3]  Andreas Karlsson,et al.  Estimation in Surveys with Nonresponse , 2007, Technometrics.

[4]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[5]  Alan F. Karr,et al.  Distortion Measures for Categorical Data Swapping , 2003 .

[6]  A. G. de Waal,et al.  Processing of Erroneous and Unsafe Data , 2003 .

[7]  Peter Kooiman,et al.  Post randomisation for statistical disclosure control: Theory and implementation , 1997 .

[8]  Ardo van den Hout,et al.  Estimating the linear regression model with categorical covariates subject to randomized response , 2006, Comput. Stat. Data Anal..

[9]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  William E. Winkler,et al.  Single-Ranking Micro-aggregation and Re-identification , 2002 .

[11]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[12]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[13]  Anna Oganian,et al.  Combinations of SDC Methods for Microdata Protection , 2006, Privacy in Statistical Databases.

[14]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[15]  A. Hout,et al.  Peruco: A Method for Producing Safe and Consistent Microdata , 2006 .

[16]  Natalie Shlomo,et al.  Statistical Disclosure Control Methods Through a Risk-Utility Framework , 2006, Privacy in Statistical Databases.

[17]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases , 2002, Lecture Notes in Computer Science.

[18]  Natalie Shlomo,et al.  Assessing Identification Risk in Survey Microdata Using Log-Linear Models , 2008 .

[19]  T. De Waal A Fast and Simple Algorithm for Automatic Editing of Mixed Data , 2003 .

[20]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[21]  Chris J. Skinner,et al.  Record level measures of disclosure risk for survey microdata , 2006 .

[22]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[23]  Natalie Shlomo,et al.  Preserving edits when perturbing microdata for statistical disclosure control , 2006 .

[24]  Mark Elliot,et al.  Disclosure Risk Assessment , 2002 .

[25]  Natalie Shlomo,et al.  A Generalized Negative Binomial Smoothing Model for Sample Disclosure Risk Estimation , 2006, Privacy in Statistical Databases.

[26]  Winfried Pohlmeier,et al.  To Blank or Not to Blank? A Comparison of the Effects of Disclosure Limitation Methods on Nonlinear Regression Estimates , 2004, Privacy in Statistical Databases.

[27]  A.D.L. Van den Hout,et al.  Analyzing Misclassified Data: Randomized Response and Post Randomization , 1999 .

[28]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[29]  Josep Domingo-Ferrer,et al.  Optimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation , 2006, Privacy in Statistical Databases.

[30]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .