A measure of disclosure risk for microdata

Summary. Protection against disclosure is important for statistical agencies releasing microdata files from sample surveys. Simple measures of disclosure risk can provide useful evidence to support decisions about release. We propose a new measure of disclosure risk: the probability that a unique match between a microdata record and a population unit is correct. We argue that this measure has at least two advantages. First, we suggest that it may be a more realistic measure of risk than two measures that are currently used with census data. Second, we show that consistent inference (in a specified sense) may be made about this measure from sample data without strong modelling assumptions. This is a surprising finding, in its contrast with the properties of the two ‘similar’ established measures. As a result, this measure has potentially useful applications to sample surveys. In addition to obtaining a simple consistent predictor of the measure, we propose a simple variance estimator and show that it is consistent. We also consider the extension of inference to allow for certain complex sampling schemes. We present a numerical study based on 1991 census data for about 450 000 enumerated individuals in one area of Great Britain. We show that the theoretical results on the properties of the point predictor of the measure of risk and its variance estimator hold to a good approximation for these data.

[1]  L. A. Goodman On the Estimation of the Number of Classes in a Population , 1949 .

[2]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[3]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[4]  G. Paass Disclosure Risk and Disclosure Avoidance for Microdata , 1988 .

[5]  D. Lambert,et al.  The Risk of Disclosure for Microdata , 1989 .

[6]  W. Keller,et al.  Disclosure control of microdata , 1990 .

[7]  Laura Voshell Zayatz Estimation of the number of unique population elements using a sample , 1991 .

[8]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .

[9]  C. Skinner,et al.  The case for samples of anonymized records from the 1991 census. , 1991, Journal of the Royal Statistical Society. Series A,.

[10]  Uwe Blien,et al.  Disclosure risk for microdata stemming from official statistics , 1992 .

[11]  L. Zayatz,et al.  Strategies for measuring risk in public use microdata files , 1992 .

[12]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .

[13]  C. Skinner,et al.  Disclosure control for census microdata , 1994 .

[14]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[15]  Chris J. Skinner,et al.  Categorical data analysis and misclassification , 1997 .

[16]  S. Fienberg,et al.  A Bayesian Approach to Data Disclosure: Optimal Intruder Behavior for Continuous Data , 1997 .

[17]  Mark Elliot,et al.  Special Uniques, Random Uniques and Sticky Populations , 1998 .

[18]  Eric R. Ziegel,et al.  Survey measurement and process quality , 1998 .

[19]  Chris J. Skinner,et al.  Estimating the re-identification risk per record in microdata , 1998 .

[20]  C. Skinner,et al.  Special Uniques, Random Uniques and Sticky Populations: Some Counterintuitive Effects of Geographical Detail on Disclosure Risk , 1998 .

[21]  S. Keller-McNulty,et al.  Estimation of Identi ® cation Disclosure Risk in Microdata , 1999 .

[22]  S. M. Samuels A Bayesian , Species-Sampling-Inspired Approach to the Uniques Problem in Microdata Disclosure Risk Assessment , 1999 .

[23]  Mark Elliot,et al.  Scenarios of attack: the data intruder's perspective on statistical disclosure risk , 1999 .

[24]  A. Winsor Sampling techniques. , 2000, Nursing times.

[25]  A. Dale,et al.  Proposals for 2001 SARS: An Assessment of Disclosure Risk , 2000 .

[26]  Mark Elliot DIS: A New Approach to the Measurement of Statistical Disclosure Risk , 2000 .

[27]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[28]  A. Dale,et al.  Proposals for 2001 samples of anonymized records: An assessment of disclosure risk , 2001 .

[29]  B. Greenberg,et al.  RELATING RISK OF DISCLOSURE FOR MICRODATA AND GEOGRAPHIC AREA SIZE , 2002 .

[30]  LONDON: HER MAJESTY'S STATIONERY OFFICE , 2022 .