Bayesian disclosure risk assessment: predicting small frequencies in contingency tables

We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focused on regions of high probability. Our approach is Bayesian and provides posterior predictive probabilities of identification risk. By incorporating model uncertainty in our analysis, we can provide more realistic estimates of disclosure risk for individual cell counts than are provided by methods which ignore the multivariate structure of the data set. Copyright 2007 Royal Statistical Society.

[1]  Akimichi Takemura,et al.  Some Superpopulation Models for Estimating the Number of Population Uniques , 1997 .

[2]  D. Fan The distribution of the product of independent beta variables , 1991 .

[3]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[4]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[5]  Luisa Franconi,et al.  Statistical and Technological Solutions for Controlled Data Dissemination , 1998 .

[6]  Yosef Rinott On models for statistical disclosure risk estimation , 2003 .

[7]  W. Keller,et al.  Disclosure control of microdata , 1990 .

[8]  Julian Stander,et al.  A Bayesian Hierarchical Model Approach to Risk Estimation in Statistical Disclosure Limitation , 2004, Privacy in Statistical Databases.

[9]  Chris J. Skinner,et al.  Estimating the re-identification risk per record in microdata , 1998 .

[10]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[11]  W. A. Ericson Subjective Bayesian Models in Sampling Finite Populations , 1969 .

[12]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[13]  Michael I. Jordan Graphical Models , 2003 .

[14]  C. Skinner,et al.  A measure of disclosure risk for microdata , 2002 .

[15]  Chris J. Skinner,et al.  Record level measures of disclosure risk for survey microdata , 2006 .

[16]  Silvia Polettini Risk assessment SOME REMARKS ON THE INDIVIDUAL RISK METHODOLOGY , 2003 .

[17]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .