Hybrid k-Anonymity

Abstract Anonymization-based privacy protection ensures that published data cannot be linked back to an individual. The most common approach in this domain is to apply generalizations on the private data in order to maintain a privacy standard such as k -anonymity. While generalization-based techniques preserve truthfulness, relatively small output space of such techniques often results in unacceptable utility loss especially when privacy requirements are strict. In this paper, we introduce the hybrid generalizations which are formed by not only generalizations but also the data relocation mechanism. Data relocation involves changing certain data cells to further populate small groups of tuples that are indistinguishable with each other. This allows us to create anonymizations of finer granularity confirming to the underlying privacy standards. Data relocation serves as a tradeoff between utility and truthfulness and we provide an input parameter to control this tradeoff. Experiments on real data show that allowing a relatively small number of relocations increases utility with respect to heuristic metrics and query answering accuracy.

[1]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[2]  Yücel Saygin,et al.  A Look-Ahead Approach to Secure Multiparty Protocols , 2012, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ninghui Li,et al.  Optimal k-Anonymity with Flexible Generalization Schemes through Bottom-up Searching , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[4]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[5]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[7]  Chris Clifton,et al.  Multirelational k-Anonymity , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[9]  Sushil Jajodia,et al.  Information disclosure under realistic assumptions: privacy versus optimality , 2007, CCS '07.

[10]  Chris Clifton,et al.  δ-Presence without Complete World Knowledge , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[12]  Tamir Tassa,et al.  k-Anonymization Revisited , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Sharad Mehrotra,et al.  Flexible Anonymization For Privacy Preserving Data Publishing: A Systematic Search Based Approach , 2007, SDM.

[14]  Yücel Saygin,et al.  Instant anonymization , 2011, TODS.

[15]  Jun-Lin Lin,et al.  A Hybrid Method for k-Anonymization , 2008, 2008 IEEE Asia-Pacific Services Computing Conference.

[16]  Suleyman Cetintas,et al.  GENERALIZATIONS WITH PROBABILITY DISTRIBUTIONS FOR DATA ANONYMIZATION , 2008 .

[17]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[19]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[20]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[21]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[23]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[24]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[25]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[26]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[27]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[28]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[29]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[30]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[31]  Mehmet Ercan Nergiz,et al.  Preservation of Utility through Hybrid k-Anonymization , 2013, TrustBus.

[32]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[33]  Elisa Bertino,et al.  Using Anonymized Data for Classification , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[34]  H. Humphrey,et al.  Standards for privacy of individually identifiable health information. , 2003, Health care law monthly.

[35]  Yücel Saygin,et al.  Anonymization of Longitudinal Electronic Medical Records , 2012, IEEE Transactions on Information Technology in Biomedicine.

[36]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[37]  Wenliang Du,et al.  Privacy-MaxEnt: integrating background knowledge in privacy quantification , 2008, SIGMOD Conference.

[38]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[39]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.