Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study

The paper presents an illustration and empirical study of releasing multiply imputed, fully synthetic public use microdata. Simulations based on data from the US Current Population Survey are used to evaluate the potential validity of inferences based on fully synthetic data for a variety of descriptive and analytic estimands, to assess the degree of protection of confidentiality that is afforded by fully synthetic data and to illustrate the specification of synthetic data imputation models. Benefits and limitations of releasing fully synthetic data sets are discussed. Copyright 2005 Royal Statistical Society.

[1]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[2]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[3]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[4]  M. Lavine More Aspects of Polya Tree Distributions for Statistical Modelling , 1992 .

[5]  Josep Domingo-Ferrer,et al.  Inference Control in Statistical Databases , 2002, Lecture Notes in Computer Science.

[6]  Jerome P. Reiter,et al.  Multiple Imputation for Statistical Disclosure Limitation , 2003 .

[7]  Jerome P. Reiter,et al.  Satisfying Disclosure Restrictions With Synthetic Data Sets , 2002 .

[8]  Stephen E. Fienberg,et al.  Disclosure limitation using perturbation and related methods for categorical data , 1998 .

[9]  Silvia Polettini,et al.  Maximum entropy simulation for microdata protection , 2003, Stat. Comput..

[10]  A. Kennickell Multiple Imputation and Disclosure Protection : TheCase of the 1995 Survey of Consumer Finances , 2000 .

[11]  Luisa Franconi,et al.  Spatial and non-spatial model-based protection procedures for the release of business microdata , 2003, Stat. Comput..

[12]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[13]  Luisa Franconi,et al.  A model-based method for disclosure limitation of business microdata , 2002 .

[14]  Josep Domingo-Ferrer,et al.  LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection , 2002, Inference Control in Statistical Databases.

[15]  R. Little,et al.  Selective Multiple Imputation of Keys for Statistical Disclosure Control in Microdata , 2003 .

[16]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[17]  George T. Duncan,et al.  Optimal Disclosure Limitation Strategy in Statistical Databases: Deterring Tracker Attacks through Additive Noise , 2000 .

[18]  Luisa Franconi,et al.  Model Based Disclosure Protection , 2002, Inference Control in Statistical Databases.

[19]  Simon D. Woodcock,et al.  Disclosure Limitation in Longitudinal Linked Data , 2002 .

[20]  Michael Cohen,et al.  Sensitive Micro Data Protection Using Latin Hypercube Sampling Technique , 2002, Inference Control in Statistical Databases.