论文信息 - Post randomisation for statistical disclosure control: Theory and implementation

Post randomisation for statistical disclosure control: Theory and implementation

This article introduces the Post RAndomisation Method (PRAM) as a method for disclosure protection of the categorical variables in a microdata ®le. Applying PRAM means that for each record in a microdata ®le the score on one or more categorical variables is changed (independently of the other records) according to a predetermined probability mechanism. Since the original data ®le is perturbed, it will be dif®cult for an intruder to identify records as corresponding to certain individuals in the population. The records in the original ®le are thus protected, which is the main goal of applying PRAM. On the other hand, since the probability mechanism that is used when applying PRAM is known, characteristics of the (latent) true data can be estimated from the perturbed data ®le. Hence it is still possible to perform all kinds of statistical analyses after PRAM has been applied. Originally we developed PRAM as the categorical variable analogon of noise addition to continuous variables; see e.g., Fuller (1993), Hwang (1986), and Kim and Winkler (1995). Only after we had developed most of the theory did we become aware of the obvious relationship of our method with the randomised response technique applied in survey sampling; see e.g., Warner (1965, 1971) and Chaudhuri and Mukerjee (1988). This method is employed in the case of highly sensitive questions to which the respondent is not likely to respond truthfully in a face-to-face setting. By embedding the question in The Post RAndomisation Method (PRAM) is a perturbative method for disclosure protection of categorical variables. Applying PRAM means that for each record in a microdata ®le the score on a number of variables is changed according to a speci®ed probability mechanism. This article considers the effect of PRAM on both the safety of the data and the statistical quality of the data. When applying PRAM in practice, a number of decisions have to be made, as for example to which variables and in what way to apply PRAM. These issues are brie ̄y discussed in this article. As an example, the result of an investigation performed at Statistics Netherlands into the possibility of protecting the Dutch National Travel Survey using PRAM is presented.

[1] W. Winkler,et al. MASKING MICRODATA FILES , 1995 .

[2] Chris J. Skinner,et al. Estimating the re-identification risk per record in microdata , 1998 .

[3] George T. Duncan,et al. Obtaining Information while Preserving Privacy: A Markov Perturbation Method for Tabular Data , 1997 .

[4] A. Chaudhuri,et al. Randomized Response: Theory and Techniques , 1987 .

[5] Ivan P. Fellegi,et al. A Theory for Record Linkage , 1969 .

[6] Ton de Waal,et al. Statistical Disclosure Control in Practice , 1996 .

[7] J. T. Hwang. Multiplicative Errors-in-Variables Models with Applications to Recent Data Released by the U.S. Department of Energy , 1986 .

[8] Chris J. Skinner,et al. Categorical data analysis and misclassification , 1997 .

[9] Carl-Erik Särndal,et al. Model Assisted Survey Sampling , 1997 .

[10] Nabil R. Adam,et al. Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[11] George T. Duncan,et al. Disclosure-Limited Data Dissemination , 1986 .

[12] S L Warner,et al. Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[13] S. Reiss,et al. Data-swapping: A technique for disclosure control , 1982 .

[14] S. Warner. The Linear Randomized Response Model , 1971 .

[15] A. Zellner. An Introduction to Bayesian Inference in Econometrics , 1971 .