Disclosure limitation through additive noise data masking: analysis of skewed sensitive data

A widely used method for confidentiality protection in statistical databases is to add zero mean noise to sensitive attribute values. Most studies assume that the attributes are normally distributed. Using an exponential random variable as an example, this article investigates the effect of additive noise data masking for attributes with skewed distributions. Examples of exponentially distributed sensitive attributes used for statistical analysis include the time between testing HIV positive and the manifestation of symptoms for AIDS and the time between consecutive arrests for repeat offenders. We analyze the issues of data quality and confidentiality protection. Our results indicate that skewed attributes are, in some sense, better protected than normally distributed attributes under additive noise data masking.