Generalizing Data to Provide Anonymity when Disclosing Information

The proliferation of information on the Internet and access to fast computers with large storage capacities has increased the volume of information collected and disseminated about individuals. The existence os these other data sources makes it much easier to re-identify individuals whose private information is released in data believed to be anonymous. At the same time, increasing demands are made on organizations to release individualized data rather than aggregate statistical information. Even when explicit identi ers, such as name and phone number, are removed or encrypted when releasing individualized data, other characteristic data, which we term quasi-identi ers, can exist which allow the data recipient to re-identify individuals to whom the data refer. In this paper, we provide a computational disclosure technique for releasing information from a private table such that the identity of any individual to whom the released data refer cannot be de nitively recognized. Our approach protects against linking to other data. It is based on the concepts of generalization, by which stored values can be replaced with semantically consistent and truthful but less precise alternatives, and of k-anonymity . A table is said to provide k-anonymity when the contained data do not allow the recipient to associate the released information to a set of individuals smaller than k. We introduce the notions of generalized table and of minimal generalization of a table with respect to a k-anonymity requirement. As an optimization problem, the objective is to minimally distort the data while providing adequate protection. We describe an algorithm that, given a table, e ciently computes a preferred minimal generalization to provide anonymity.