On-the-Fly Hierarchies for Numerical Attributes in Data Anonymization

We present in this paper a method for dynamically creating hierarchies for quasi-identifier numerical attributes. The resulting hierarchies can be used for generalization in microdata k-anonymization, or for allowing users to define generalization boundaries for constrained k-anonymity. The construction of a new numerical hierarchy for a numerical attribute is performed as a hierarchical agglomerative clustering of that attribute's values in the dataset to anonymize. Therefore, the resulting tree hierarchy reflects well the closeness and clustering tendency of the attribute's values in the dataset. Due to this characteristic of the hierarchies created on-the-fly for quasi-identifier numerical attributes, the quality of the microdata anonymized through generalization based on these hierarchies is well preserved, and the information loss in the anonymization process remains in reasonable bounds, as proved experimentally.

[1]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[4]  Indrakshi Ray,et al.  A crossover operator for the k- anonymity problem , 2006, GECCO '06.

[5]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Yufei Tao,et al.  Preservation of proximity privacy in publishing numerical sensitive data , 2008, SIGMOD Conference.

[7]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[9]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[10]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[11]  T. Truta,et al.  Constrained k-Anonymity : Privacy with Generalization Boundaries , 2022 .

[12]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[13]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[14]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[15]  Ke Wang,et al.  On optimal anonymization for l+-diversity , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  Yansheng Lu,et al.  (t, λ)-Uniqueness: Anonymity Management for Data Publication , 2008, Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008).

[17]  Elisa Bertino,et al.  EFFICIENT K-ANONYMITY USING CLUSTERING TECHNIQUE , 2006 .