Statistical Disclosure Limitation for Health Data: A Statistical Agency Perspective

Statistical agencies release heath data collected in surveys, censuses and registers. In this chapter, statistical disclosure limitation (SDL) from the perspective of statistical agencies is presented. Traditional outputs in the form of survey microdata and tabular outputs are first presented with respect to quantifying disclosure risk, common SDL techniques for protecting the data, and measuring information loss. In recent years, however, there is greater demand for data including government ‘open data’ initiatives, which have led statistical agencies to examine additional forms of disclosure risks, related to the concept of differential privacy in the computer science literature. A discussion on whether SDL practices carried out at statistical agencies for traditional outputs are differentially private, is provided in the chapter. The chapter concludes with the presentation of some innovative data dissemination strategies that are currently being assessed by statistical agencies, where stricter privacy guarantees are necessary.

[1]  Natalie Shlomo,et al.  Measuring Disclosure Risk with Entropy in Population Based Frequency Tables , 2014, Privacy in Statistical Databases.

[2]  Chris J. Skinner,et al.  Estimating the re-identification risk per record in microdata , 1998 .

[3]  Jerome P. Reiter,et al.  Multiple Imputation for Statistical Disclosure Limitation , 2003 .

[4]  Natalie Shlomo,et al.  Measuring Disclosure Risk and Data Utility for Flexible Table Generators , 2015 .

[5]  Natalie Shlomo,et al.  Invariant Post-tabular Protection of Census Frequency Counts , 2008, Privacy in Statistical Databases.

[6]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[7]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[8]  R. Chambers,et al.  Estimating distribution functions from survey data , 1986 .

[9]  W. Keller,et al.  Disclosure control of microdata , 1990 .

[10]  Christine M. O'Keefe,et al.  A Remote Analysis Server - What Does Regression Output Look Like? , 2008, Privacy in Statistical Databases.

[11]  Natalie Shlomo,et al.  Assessing Identification Risk in Survey Microdata Using Log-Linear Models , 2008 .

[12]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[13]  J. Madans The morris hansen lecture 2004 bridging the gap: moving to the 1997 standards for collecting data on race and ethnicity , 2008 .

[14]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[15]  Anco Hundepool The CASC Project , 2002, Inference Control in Statistical Databases.

[16]  Natalie Shlomo,et al.  A Generalized Negative Binomial Smoothing Model for Sample Disclosure Risk Estimation , 2006, Privacy in Statistical Databases.

[17]  Natalie Shlomo,et al.  Statistical Disclosure Control Methods Through a Risk-Utility Framework , 2006, Privacy in Statistical Databases.

[18]  Chris J. Skinner,et al.  Record level measures of disclosure risk for survey microdata , 2006 .

[19]  Natalie Shlomo,et al.  Privacy Protection from Sampling and Perturbation in Survey Microdata , 2012, J. Priv. Confidentiality.

[20]  Natalie Shlomo,et al.  A smoothing model for sample disclosure risk estimation , 2007 .

[21]  Lars Vilhuber,et al.  How Protective Are Synthetic Data? , 2008, Privacy in Statistical Databases.

[22]  Alan F. Karr,et al.  Distortion Measures for Categorical Data Swapping , 2003 .

[23]  Natalie Shlomo,et al.  Protection of micro-data subject to edit constraints against Statistical Disclosure , 2008 .

[24]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[25]  Kamalika Chaudhuri,et al.  When Random Sampling Preserves Privacy , 2006, CRYPTO.

[26]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[27]  Natalie Shlomo,et al.  Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata , 2010, 1011.2905.

[28]  Natalie Shlomo Statistical disclosure control methods for census frequency tables , 2007 .

[29]  Jerome P. Reiter,et al.  Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study , 2005 .

[30]  R. Little,et al.  Selective Multiple Imputation of Keys for Statistical Disclosure Control in Microdata , 2003 .

[31]  Natalie Shlomo,et al.  Comparison of Remote Analysis with Statistical Disclosure Control for Protecting the Confidentiality of Business Data , 2012, Trans. Data Priv..