An Empirical Comparison of Some Methods for Disclosure Risk Assessment

With the release of public-use microdata filles it is important to assess the risk of disclosing individual information. A measure of disclosure risk often considered in the literature is the proportion of unique records in the file that are also unique in the population. Various methods based on superpopulation models have been proposed for estimating this quantity using sample data. An empirical comparison of a selection of models applied to three real-life data sets is presented. The general conclusion is that no one model is uniformly best with respect to the risk measure used and that performance varies greatly between di¤erent types of data.

[1]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[2]  Michael Carlson,et al.  Assessing Microdata Disclosure Risk Using the Poisson-Inverse Guassian Distribution , 2002 .

[3]  Akimichi Takemura,et al.  Some Superpopulation Models for Estimating the Number of Population Uniques , 1997 .

[4]  S. Keller-McNulty,et al.  Estimation of Identi ® cation Disclosure Risk in Microdata , 1999 .

[5]  C. Skinner,et al.  A measure of disclosure risk for microdata , 2002 .

[6]  C. Skinner,et al.  Disclosure control for census microdata , 1994 .

[7]  G. Paass Disclosure Risk and Disclosure Avoidance for Microdata , 1988 .

[8]  N. Hoshino,et al.  Applying Pitman's Sampling Formula to Microdata Disclosure Risk Assessment , 2001 .

[9]  Chris J. Skinner,et al.  Estimating the re-identification risk per record in microdata , 1998 .

[10]  S. M. Samuels A Bayesian , Species-Sampling-Inspired Approach to the Uniques Problem in Microdata Disclosure Risk Assessment , 1999 .

[11]  Mark Elliot Integrating File and Record Level Disclosure Risk Assessment , 2002, Inference Control in Statistical Databases.

[12]  W. Keller,et al.  Disclosure control of microdata , 1990 .

[13]  P. Grundy THE EXPECTED FREQUENCIES IN A SAMPLE OF AN ANIMAL POPULATION IN WHICH THE ABUNDANCES OF SPECIES ARE LOG-NORMALLY DISTRIBUTED. PART I , 1951 .

[14]  M. Bulmer On Fitting the Poisson Lognormal Distribution to Species-Abundance Data , 1974 .

[15]  A. Takemura,et al.  On the Relation between Logarithmic Series Model and Other Superpopulation Models Useful for Microdata Disclosure Risk Assessment , 1998 .

[16]  S. Fienberg,et al.  Con ® dentiality , Uniqueness , and Disclosure Limitation for Categorical Data 1 , 1999 .

[17]  C. J. Skinner,et al.  Modelling population uniqueness , 1993 .

[18]  L. Zayatz,et al.  Strategies for measuring risk in public use microdata files , 1992 .