A novel privacy preserving method for data publication

Abstract Privacy has received increasing concerns in publication of datasets that contain sensitive information. Preventing privacy disclosure and providing useful information to legitimate users for data mining are conflicting goals. Generalization and randomized response methods were proposed in database community to tackle this problem. However, both of them have postulated the same prior belief for all transactions, which might be wrong modeling and lead to privacy breach. Besides, generalization and randomized response methods usually require a privacy controlling parameter to control the tradeoff between privacy and data quality, which may put the data publishers in a dilemma. In this paper, a novel privacy preserving method for data publication is proposed based on conditional probability distribution and machine learning techniques, which can achieve different prior beliefs for different transactions. A basic cross sampling algorithm and a complete cross sampling algorithm are designed respectively for the settings of single sensitive attribute and multiple sensitive attributes, and an improved complete algorithm is developed by using Gibbs sampling, in order to enhance data utility when data are not sufficient. Our method can offer stronger privacy guarantee, while, as shown in the extensive experiments, retaining better data utility.

[1]  Ryo Nojima,et al.  Analyzing Randomized Response Mechanisms Under Differential Privacy , 2016, ISC.

[2]  Lin Zhang,et al.  An Improved Algorithm of Individuation K-Anonymity for Multiple Sensitive Attributes , 2017, Wirel. Pers. Commun..

[3]  Jordi Forné,et al.  p-Probabilistic k-anonymous microaggregation for the anonymization of surveys with uncertain participation , 2017, Inf. Sci..

[4]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[5]  Yan Jia,et al.  A Multi-phase k-anonymity Algorithm Based on Clustering Techniques , 2012, ISCTCS.

[6]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[7]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[8]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[9]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[10]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Kshitij Pathak,et al.  K-anonymity Model for Multiple Sensitive Attributes , 2012 .

[12]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[13]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[14]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[15]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[16]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[17]  Akihiko Ohsuga,et al.  Randomized addition of sensitive attributes for l-diversity , 2014, 2014 11th International Conference on Security and Cryptography (SECRYPT).

[18]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[20]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[21]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[23]  Witold Pedrycz,et al.  Publicly verifiable privacy-preserving aggregation and its application in IoT , 2019, J. Netw. Comput. Appl..

[24]  Tong Li,et al.  Outsourced privacy-preserving classification service over encrypted data , 2018, J. Netw. Comput. Appl..

[25]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[26]  Jayant R. Haritsa,et al.  A Framework for High-Accuracy Privacy-Preserving Mining , 2005, ICDE.

[27]  Christof Fetzer,et al.  Privacy Preserving Stream Analytics: The Marriage of Randomized Response and Approximate Computing , 2017, ArXiv.

[28]  Christoph Meinel,et al.  Automated k-Anonymization and l-Diversity for Shared Data Privacy , 2016, DEXA.

[29]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[30]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[31]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[32]  Nina Mishra,et al.  Privacy via pseudorandom sketches , 2006, PODS.

[33]  Steven P. Reiss Practical Data-Swapping: The First Steps , 1980, 1980 IEEE Symposium on Security and Privacy.

[34]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[35]  Josep Domingo-Ferrer,et al.  New directions in anonymization: Permutation paradigm, verifiability by subjects and intruders, transparency to users , 2015, Inf. Sci..

[36]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[37]  Jianliang Xu,et al.  When Query Authentication Meets Fine-Grained Access Control: A Zero-Knowledge Approach , 2018, SIGMOD Conference.

[38]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[39]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[40]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[41]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[42]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[43]  Akihiko Ohsuga,et al.  Anonymization of Sensitive Quasi-Identifiers for l-Diversity and t-Closeness , 2019, IEEE Transactions on Dependable and Secure Computing.

[44]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[45]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.