G-Model: A Novel Approach to Privacy-Preserving 1:M Microdata Publication

Public availability of electronic health records raises major privacy concerns, as that data contains confidential personal information of individuals. Publishing such data must be accompanied by appropriate privacy-preserving techniques to avoid or at least minimize privacy breaches. The task of privacy preservation becomes even more challenging when the data have multiple sensitive attributes (SAs). Privacy risks increase even further when an individual has multiple records (1:M) in a dataset, a rather typical situation with electronic health records (EHRs). To overcome these privacy issues, the methodologies known as 1:M generalization and l-anatomy have been proposed by the research community. However, these models fail to provide optimal privacy protection, data utility and security against certain types of attacks, such as gender-specific SA attacks. In this paper, we propose a generic 1:M data privacy model, called G-model, which provides guaranteed data privacy with high data utility and no information loss. Our G-model maintains separate groups and caches of male and female SAs, thus protecting privacy against gender-specific SA attacks. Furthermore, G-model avoids generalization, thus providing high data utility with no information loss. Experiments performed on three real-world datasets (Adult, Informs, and YouTube datasets) have shown that the proposed model is more efficient and better at privacy protection than the existing models from the literature.

[1]  Mahdi Abadi,et al.  PPTD: Preserving personalized privacy in trajectory data publishing by sensitive attribute generalization and trajectory local suppression , 2016, Knowl. Based Syst..

[2]  Ming Yang,et al.  Anonymizing 1: M microdata with high utility , 2017, Knowl. Based Syst..

[3]  Hyein Lee,et al.  A Determination Scheme for Quasi-Identifiers Using Uniqueness and Influence for De-Identification of Clinical Data , 2018, J. Medical Imaging Health Informatics.

[4]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[5]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[7]  Spiros Skiadopoulos,et al.  Anonymizing Data with Relational and Transaction Attributes , 2013, ECML/PKDD.

[8]  Shiwei Tang,et al.  Protecting the Publishing Identity in Multiple Tuples , 2008, DBSec.

[9]  Yufei Tao,et al.  ANGEL: Enhancing the Utility of Generalization for Privacy Preserving Publication , 2009, IEEE Transactions on Knowledge and Data Engineering.

[10]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[11]  Jian Pei,et al.  A brief survey on anonymization techniques for privacy preserving publishing of social network data , 2008, SKDD.

[12]  Jian Pei,et al.  Publishing Sensitive Transactions for Itemset Utility , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[15]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[16]  Dongqing Yang,et al.  Identity-Reserved Anonymity in Privacy Preserving Data Publishing: Identity-Reserved Anonymity in Privacy Preserving Data Publishing , 2010 .

[17]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Eric Matthew Lobato The Anonymity Engine, Minimizing Quasi-Identifiers to Strengthen K-Anonymity , 2017 .

[19]  Tamir Tassa,et al.  Improving accuracy of classification models induced from anonymized datasets , 2014, Inf. Sci..

[20]  Moneeb Gohar,et al.  An effective privacy preserving mechanism for 1: M microdata with high utility , 2019 .

[21]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[22]  Haoran Li,et al.  Finding Probabilistic k-Skyline Sets on Uncertain Data , 2015, CIKM.

[23]  Nikos Mamoulis,et al.  Non-homogeneous generalization in privacy preserving data publishing , 2010, SIGMOD Conference.

[24]  T. Christopher,et al.  Anatomisation with slicing: a new privacy preservation approach for multiple sensitive attributes , 2016, SpringerPlus.

[25]  Sara Foresti,et al.  Microdata Protection , 2007, Encyclopedia of Cryptography and Security.

[26]  Naveed Ahmad,et al.  Privacy Preservation in Skewed Data Using Frequency Distribution and Weightage (FDW) , 2017 .

[27]  Lior Rokach,et al.  Limiting disclosure of sensitive data in sequential releases of databases , 2012, Inf. Sci..