Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data

Personally identifiable information (PII) affects individual privacy because PII combinations may yield unique identifications in published data. User PII such as age, race, gender, and zip code contain private information that may assist an adversary in determining the user to whom such information relates. Each item of user PII reveals identity differently, and some types of PII are highly identity vulnerable. More vulnerable types of PII enable unique identification more easily, and their presence in published data increases privacy risks. Existing privacy models treat all types of PII equally from an identity revelation point of view, and they mainly focus on hiding user PII in a crowd of other users. Ignoring the identity vulnerability of each type of PII during anonymization is not an effective method of protecting user privacy in a fine-grained manner. This paper proposes a new anonymization scheme that considers the identity vulnerability of PII to effectively protect user privacy. Data generalization is performed adaptively based on the identity vulnerability of PII as well as diversity to anonymize data. This adaptive generalization effectively enables anonymous data, which protects user identity and private information disclosures while maximizing the utility of data for performing analyses and building classification models. Additionally, the proposed scheme has low computational overheads. The simulation results show the effectiveness of the scheme and verify the aforementioned claims.

[1]  Charlie Obimbo,et al.  A Novel Differential Privacy Approach that Enhances Classification Accuracy , 2016, C3S2E.

[2]  Aris Gkoulalas-Divanis,et al.  Revisiting sequential pattern hiding to enhance utility , 2011, KDD.

[3]  Aris Gkoulalas-Divanis,et al.  Hiding sensitive knowledge without side effects , 2009, Knowledge and Information Systems.

[4]  Junqiang Liu Privacy Preserving Data Publishing: Current Status and New Directions , 2012 .

[5]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[6]  Ran Wolff,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Providing k-Anonymity in Data Mining , 2022 .

[7]  Luca Cagliero,et al.  Improving classification models with taxonomy information , 2013, Data Knowl. Eng..

[8]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[9]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[10]  Raymond Chi-Wing Wong,et al.  Information based data anonymization for classification utility , 2011, Data Knowl. Eng..

[11]  Slava Kisilevich,et al.  Efficient Multidimensional Suppression for K-Anonymity , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  David L. Buckeridge,et al.  The re-identification risk of Canadians from longitudinal demographics , 2011, BMC Medical Informatics Decis. Mak..

[13]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Rathindra Sarathy,et al.  Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data , 2011, Trans. Data Priv..

[15]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[16]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[17]  Ilango Paramasivam,et al.  Anonymization in PPDM based on Data Distributions and Attribute Relations , 2016 .

[18]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[19]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  Joydeep Ghosh,et al.  Investigation of the random forest framework for classification of hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[21]  David J. Brown,et al.  A survey on computational intelligence approaches for predictive modeling in prostate cancer , 2017, Expert Syst. Appl..

[22]  Aris Gkoulalas-Divanis,et al.  Efficient and flexible anonymization of transaction data , 2012, Knowledge and Information Systems.

[23]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[24]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[25]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[26]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[27]  Jianhua Li,et al.  Dynamic Privacy Pricing: A Multi-Armed Bandit Approach With Time-Variant Rewards , 2017, IEEE Transactions on Information Forensics and Security.

[28]  J. Baron The effects of overgeneralization on public policy , 2000 .

[29]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[33]  John Elder,et al.  Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications , 2012 .

[34]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[35]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[36]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[37]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[38]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[39]  Robert Gwadera,et al.  Permutation-Based Sequential Pattern Hiding , 2013, 2013 IEEE 13th International Conference on Data Mining.

[40]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[41]  Sheng Zhong,et al.  Privacy-enhancing k-anonymization of customer data , 2005, PODS.

[42]  Wenliang Du,et al.  Privacy-MaxEnt: integrating background knowledge in privacy quantification , 2008, SIGMOD Conference.

[43]  Yufei Tao,et al.  On Anti-Corruption Privacy Preserving Publication , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[44]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[45]  Philip S. Yu,et al.  Reconstruction Privacy: Enabling Statistical Learning , 2015, EDBT.

[46]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[47]  Huiqun Yu,et al.  A Complete (alpha,k)-Anonymity Model for Sensitive Values Individuation Preservation , 2008, 2008 International Symposium on Electronic Commerce and Security.

[48]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[49]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[50]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[51]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[52]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[53]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[54]  Charlie Obimbo,et al.  Privacy Preserving Data Publishing: A Classification Perspective , 2014 .

[55]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[56]  Josep Domingo-Ferrer,et al.  Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees , 2016, IEEE Transactions on Information Forensics and Security.

[57]  Yon Dohn Chung,et al.  Privacy-preserving data cube for electronic medical records: An experimental evaluation , 2017, Int. J. Medical Informatics.

[58]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.