A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND

Survey data is often released as microdata. Survey respondents are thus subjected to the risk of reidenti f icat ion and disclosure of confidential data, even when identi fying information such as name and address is deleted prior to release of data. To avoid this disclosure problem, m easures of m asking the data have been proposed. They include adding random error, multiplying by random error, microaggregating, data swapping, random rounding, slicing and co m bining subrecords. Two reseachers compared those measures with respect to their masking capability and i m pact on key statistics. Specifically, Spruill (1983) performed an empirical study of comparison of additive random noise, mult ipl icative random error, microaggregation, random rounding and data swapping methods with regard to the effect of masking on key statistics. She also perform ed a reidentif ication experi m ant based on the distance measure of absolute deviation and squared deviation. Paass(1985) also performed a reidenti f icat ion experi m ant based on a refined m easure of ident i f icat ion including discriminant analysis. He found from his experiment that the addition of random error is not an effective measure and hence proposed new masking schemes such as slicing and subrecordsco m bination. As has been shown in both studies, some measures maintain the unbiased values of sum mary statistics such as mean and standard deviation but others lose the unbiasedness of the data. Also some schemes preserve the original structural relations and hence original causal relationships. However, others don%. According to Paass, the combination method which is best suitable for masking caused serious distortion of relationships among variables. This squarely puts us in the quandary as to whether or not we opt for protection in spite of grave sacrifice of usefulness of the data. From the users' point of view, maintenance of the usefulness of the data is the abiding require m ant for a good m asking sche m e. At the Bureau of the Census, we have been faced with masking microdata fi les. For masking earnings data, a new scheme has been developed. The scheme is a combination of random noise inoculation and transformation. In this paper I will describe this new measure and provide examples of application of the measure on the earnings data. Since multiple regression is the primary use of the earnings data, I will discuss the theoretical effects of masking on the regression. It should be mentioned that the power of l imi t ing the disclosure by this scheme has not been ful ly investigated. We are presently planning on performing reidentif ication experiment using the software developed by Paass' group. An advantage of the scheme proposed here is, i f users are willing to do multipl ication to get an unbiased estimate of the second moment of the original (unmasked) variables, then we can compact the data points around the mean while the correlation structure is not ha m pered. This can be done by using a small "a" value, as to be seen later. For si m pl ic i ty, the derivation of form ulae is based on the unweighted data. 11.1 Transformation on The Variable to Which Random Noise Was Added