Heterogeneous k-anonymization with high utility

Among the privacy-preserving approaches that are known in the literature, h-anonymity remains the basis of more advanced models while still being useful as a stand-alone solution. Applying h-anonymity in practice, though, incurs severe loss of data utility, thus limiting its effectiveness and reliability in real-life applications and systems. However, such loss in utility does not necessarily arise from an inherent drawback of the model itself, but rather from the deficiencies of the algorithms used to implement the model. Conventional approaches rely on a methodology that publishes data in homogeneous generalized groups. An alternative modern data publishing scheme focuses on publishing the data in heterogeneous groups and achieves higher utility, while ensuring the same privacy guarantees. As conventional approaches cannot anonymize data following this heterogeneous scheme, innovative solutions are required for this purpose. Following this approach, in this paper we provide a set of algorithms that ensure high-utility h-anonymity, via solving an equivalent graph processing problem.

[1]  Panos Kalnis,et al.  SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness , 2011, The VLDB Journal.

[2]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[3]  Nikos Mamoulis,et al.  Non-homogeneous generalization in privacy preserving data publishing , 2010, SIGMOD Conference.

[4]  Jeffrey D. Ullman,et al.  Big data: a research agenda , 2013, IDEAS '13.

[5]  Jianneng Cao,et al.  Publishing Microdata with a Robust Privacy Guarantee , 2012, Proc. VLDB Endow..

[6]  Krzysztof Choromanski,et al.  Adaptive Anonymity via b-Matching , 2013, NIPS.

[7]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[8]  Ke Wang,et al.  Small domain randomization , 2010, Proc. VLDB Endow..

[9]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[10]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[11]  Panos Kalnis,et al.  A framework for efficient data anonymization under privacy and accuracy constraints , 2009, TODS.

[12]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[13]  Alfredo Cuzzocrea,et al.  Discovering Frequent Patterns from Uncertain Data Streams with Time-Fading and Landmark Models , 2013, Trans. Large Scale Data Knowl. Centered Syst..

[14]  Dimitrios Tsoumakos,et al.  k-Anonymization by Freeform Generalization , 2015, AsiaCCS.

[15]  Ninghui Li,et al.  On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy , 2011, ASIACCS '12.

[16]  A. V. Sriharsha,et al.  On Syntactic Anonymity and Differential Privacy , 2015 .

[17]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Tamir Tassa,et al.  k-Concealment: An Alternative Model of k-Type Anonymity , 2012, Trans. Data Priv..

[19]  Tamir Tassa,et al.  k-Anonymization Revisited , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[20]  Dr B Santhosh Kumar Santhosh Balan,et al.  Closeness : A New Privacy Measure for Data Publishing , 2022 .

[21]  Chedy Raïssi,et al.  Anonymizing set-valued data by nonreciprocal recoding , 2012, KDD.

[22]  David J. DeWitt,et al.  Workload-aware anonymization techniques for large-scale datasets , 2008, TODS.

[23]  Aleksandra Korolova Privacy Violations Using Microtargeted Ads: A Case Study , 2011, J. Priv. Confidentiality.

[24]  Alfredo Cuzzocrea,et al.  Fragmenting very large XML data warehouses via K-means clustering algorithm , 2009, Int. J. Bus. Intell. Data Min..