Anonymization of Sensitive Quasi-Identifiers for l-Diversity and t-Closeness

A number of studies on privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but are not sensitive attributes and that a disease name is a sensitive attribute but is not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs in practice. In this paper, we refer to these attributes as sensitive QIDs and we propose novel privacy models, namely, (l1, …, lq)-diversity and (t1, …, tq)-closeness, and a method that can treat sensitive QIDs. Our method is composed of two algorithms: An anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data analyzers, can be conducted according to each data analyzer's objective. Our proposed method was experimentally evaluated using real data sets.

[1]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[2]  Nikos Mamoulis,et al.  Privacy Preservation by Disassociation , 2012, Proc. VLDB Endow..

[3]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[5]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[6]  Lingyu Wang,et al.  PPTP: Privacy-Preserving Traffic Padding in Web-Based Applications , 2014, IEEE Transactions on Dependable and Secure Computing.

[7]  Hsueh-Hsien Chang,et al.  Feature Extraction-Based Hellinger Distance Algorithm for Nonintrusive Aging Load Identification in Residential Buildings , 2016, IEEE Transactions on Industry Applications.

[8]  Claude Castelluccia,et al.  Differentially Private Histogram Publishing through Lossy Compression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[9]  Joshua Zhexue Huang,et al.  Rating: Privacy Preservation for Multiple Attributes with Different Sensitivity Requirements , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[10]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[11]  Laurent Amsaleg,et al.  A Privacy-Preserving Framework for Large-Scale Content-Based Information Retrieval , 2015, IEEE Transactions on Information Forensics and Security.

[12]  Charu C. Aggarwal,et al.  Managing dimensionality in data privacy anonymization , 2015, Knowledge and Information Systems.

[13]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[14]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[15]  Benjamin C. M. Fung,et al.  Anonymizing data with quasi-sensitive attribute values , 2010, CIKM.

[16]  Xin Jin,et al.  Versatile publishing for privacy preservation , 2010, KDD.

[17]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[18]  Josep Domingo-Ferrer,et al.  Utility-preserving differentially private data releases via individual ranking microaggregation , 2015, Inf. Fusion.

[19]  Yin Yang,et al.  Differentially private histogram publication , 2012, The VLDB Journal.

[20]  Pamela J. Brink,et al.  Basic Steps in Planning Nursing Research: From Question to Proposal , 1978 .

[21]  Xintao Wu,et al.  Privacy Preserving Market Basket Data Analysis , 2007, PKDD.

[22]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[23]  Raymond Chi-Wing Wong,et al.  FF-Anonymity: When Quasi-identifiers Are Missing , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[24]  Nikos Pelekis,et al.  Privacy-Preserving Indoor Localization on Smartphones , 2015, IEEE Transactions on Knowledge and Data Engineering.

[25]  Josep Domingo-Ferrer,et al.  t-Closeness through Microaggregation: Strict Privacy with Enhanced Utility Preservation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[26]  Domingo-FerrerJosep,et al.  Enhancing data utility in differential privacy via microaggregation-based k-anonymity , 2014, VLDB 2014.

[27]  Brian Everitt,et al.  Cluster analysis , 1974 .

[28]  Natalie Shlomo Statistical Disclosure Limitation for Health Data: A Statistical Agency Perspective , 2015, Medical Data Privacy Handbook.

[29]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[30]  Josep Domingo-Ferrer,et al.  Improving the Utility of Differentially Private Data Releases via k-Anonymity , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[31]  Josep Domingo-Ferrer,et al.  Enhancing data utility in differential privacy via microaggregation-based k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2014, The VLDB Journal.

[32]  Yu Liu,et al.  Decomposition: Privacy Preservation for Multiple Sensitive Attributes , 2009, DASFAA.

[33]  Dawn Xiaodong Song,et al.  Preserving Link Privacy in Social Network Based Systems , 2012, NDSS.

[34]  R. Grover The Handbook of Marketing Research: Uses, Misuses, and Future Advances , 2006 .

[35]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[36]  Ninghui Li,et al.  PriView: practical differentially private release of marginal contingency tables , 2014, SIGMOD Conference.

[37]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[38]  Akihiko Ohsuga,et al.  Privacy Preservation for Participatory Sensing Applications , 2016, 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA).

[39]  Josep Domingo-Ferrer,et al.  Probabilistic k-anonymity through microaggregation and data swapping , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[40]  Jianneng Cao,et al.  Publishing Microdata with a Robust Privacy Guarantee , 2012, Proc. VLDB Endow..

[41]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[42]  Jun Zhang,et al.  PrivBayes: private data release via bayesian networks , 2014, SIGMOD Conference.

[43]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[44]  Claudio Bettini,et al.  JS-Reduce: Defending Your Data from Sequential Background Knowledge Attacks , 2012, IEEE Transactions on Dependable and Secure Computing.

[45]  Josep Domingo-Ferrer,et al.  Comment on “Unique in the shopping mall: On the reidentifiability of credit card metadata” , 2015, Science.