Hierarchical anonymization algorithms against background knowledge attack in data releasing

We define a privacy model based on k-anonymity and one of its strong refinements to prevent the background knowledge attack.We propose two hierarchical anonymization algorithm to satisfy our privacy model.Our algorithms outperform the state-of the art anonymization algorithm in terms of utility and privacy.We extend an information loss measure to capture data inaccuracies caused by not-fitted records in any equivalence class. Preserving privacy in the presence of adversary's background knowledge is very important in data publishing. The k-anonymity model, while protecting identity, does not protect against attribute disclosure. One of strong refinements of k-anonymity, β-likeness, does not protect against identity disclosure. Neither model protects against attacks featured by background knowledge. This research proposes two approaches for generating k-anonymous β-likeness datasets that protect against identity and attribute disclosures and prevent attacks featured by any data correlations between QIs and sensitive attribute values as the adversary's background knowledge. In particular, two hierarchical anonymization algorithms are proposed. Both algorithms apply agglomerative clustering techniques in their first stage in order to generate clusters of records whose probability distributions extracted by background knowledge are similar. In the next phase, k-anonymity and β-likeness are enforced in order to prevent identity and attribute disclosures. Our extensive experiments demonstrate that the proposed algorithms outperform other state-of-the-art anonymization algorithms in terms of privacy and data utility where the number of unpublished records in our algorithms is less than that of the others. As well-known information loss metrics fail to measure precisely the imposed data inaccuracies stemmed from the removal of records that cannot be published in any equivalence class. This research also introduces an extension into the Global Certainty Penalty metric that considers unpublished records.

[1]  Panos Kalnis,et al.  SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness , 2011, The VLDB Journal.

[2]  Claudio Bettini,et al.  JS-Reduce: Defending Your Data from Sequential Background Knowledge Attacks , 2012, IEEE Transactions on Dependable and Secure Computing.

[3]  Huawen Liu,et al.  MAGE: A semantics retaining K-anonymization method for mixed data , 2014, Knowl. Based Syst..

[4]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[5]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[6]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[7]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[8]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[9]  Xiaofeng Ding,et al.  A general framework for privacy preserving data publishing , 2013, Knowl. Based Syst..

[10]  V. B. Dalvi,et al.  Bottom-Up Generalization: A Data Mining Solution to Privacy Protection , 2015 .

[11]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[12]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge in Privacy , 2006 .

[13]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[15]  Wendy Hui Wang,et al.  Hiding outliers into crowd: Privacy-preserving data publishing with outliers , 2015, Data Knowl. Eng..

[16]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[17]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[18]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[19]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Tamir Tassa,et al.  k-Anonymization Revisited , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Ninghui Li,et al.  Minimizing minimality and maximizing utility , 2010, Proc. VLDB Endow..

[22]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[23]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[24]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[25]  Wenliang Du,et al.  Privacy-MaxEnt: integrating background knowledge in privacy quantification , 2008, SIGMOD Conference.

[26]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[27]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[28]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.

[29]  Ninghui Li,et al.  Modeling and Integrating Background Knowledge in Data Anonymization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[30]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[31]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[32]  Tamir Tassa,et al.  k -Anonymization with Minimal Loss of Information , 2007, ESA.

[33]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[34]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[35]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[36]  Saeed Jalili,et al.  Fast data-oriented microaggregation algorithm for large numerical datasets , 2014, Knowl. Based Syst..

[37]  Jianneng Cao,et al.  Publishing Microdata with a Robust Privacy Guarantee , 2012, Proc. VLDB Endow..

[38]  Chris Clifton,et al.  Efficient Sanitization of Unsafe Data Correlations , 2015, EDBT/ICDT Workshops.

[39]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[40]  Wendy Hui Wang,et al.  Privacy-preserving publishing microdata with full functional dependencies , 2011, Data Knowl. Eng..

[41]  Fabian Prasser,et al.  The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss , 2015, J. Biomed. Informatics.

[42]  Josep Domingo-Ferrer,et al.  t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation , 2015, IEEE Transactions on Knowledge and Data Engineering.