A framework for utility enhanced incomplete microdata anonymization

Incomplete microdata, i.e., microdata with missing value, is very common in real-world datasets. However, existing anonymization techniques, which were developed for complete datasets, suffer from serious information loss on incomplete microdata, due to the missing value pollution. In this paper, we propose a framework for utility enhanced anonymization of incomplete microdata to address this issue. First, we study the properties of missing value pollution on generalization. Guided by these properties, we develop two top-down anonymization algorithms to preserve data utility on incomplete microdata. Extensive experiments on real-world datasets show that our techniques outperform the state-of-the-art techniques in terms of information loss and missing value pollution.

[1]  Spiros Skiadopoulos,et al.  Anonymizing Data with Relational and Transaction Attributes , 2013, ECML/PKDD.

[2]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Qishan Zhang,et al.  Fast clustering-based anonymization approaches with time constraints for data streams , 2013, Knowl. Based Syst..

[4]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[5]  Ming Yang,et al.  AIM: A New Privacy Preservation Algorithm for Incomplete Microdata Based on Anatomy , 2012, ICPCA/SWS.

[6]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[7]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[9]  Chedy Raïssi,et al.  Anonymizing set-valued data by nonreciprocal recoding , 2012, KDD.

[10]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Hemanta Kumar Bhuyan,et al.  Privacy preserving sub-feature selection based on fuzzy probabilities , 2014, Cluster Computing.

[12]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[13]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[14]  Chengqi Zhang,et al.  Missing Value Imputation Based on Data Clustering , 2008, Trans. Comput. Sci..

[15]  John Francis Kros,et al.  Data mining and the impact of missing data , 2003, Ind. Manag. Data Syst..

[16]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[17]  Kotagiri Ramamohanarao,et al.  Scalable Local-Recoding Anonymization using Locality Sensitive Hashing for Big Data Privacy Preservation , 2016, CIKM.

[18]  Chris Clifton,et al.  Multirelational k-Anonymity , 2009, IEEE Trans. Knowl. Data Eng..

[19]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[20]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[21]  Jouni Markkula Dynamic Geographic Personal Data – New Opportunity and Challenge Introduced by the Location-Aware Mobile Networks , 2004, Cluster Computing.

[22]  Nikos Mamoulis,et al.  Non-homogeneous generalization in privacy preserving data publishing , 2010, SIGMOD Conference.

[23]  Yufei Tao,et al.  The hardness and approximation algorithms for l-diversity , 2009, EDBT '10.

[24]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[25]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[26]  Bing Chen,et al.  Cloud service platform of electronic identity in cyberspace , 2017, Cluster Computing.

[27]  Jianneng Cao,et al.  Publishing Microdata with a Robust Privacy Guarantee , 2012, Proc. VLDB Endow..

[28]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[29]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Dimitrios Tsoumakos,et al.  k-Anonymization by Freeform Generalization , 2015, AsiaCCS.

[31]  Zhihong Chong,et al.  Clustering-oriented privacy-preserving data publishing , 2012, Knowl. Based Syst..

[32]  Ming Yang,et al.  Anonymizing 1: M microdata with high utility , 2017, Knowl. Based Syst..

[33]  Charu C. Aggarwal,et al.  Privacy-preserving big data publishing , 2015, SSDBM.

[34]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[35]  K. Srinivasan,et al.  Missing Value Estimation for Mixed Attribute Data Sets , 2016 .