Algorithm-safe privacy-preserving data publishing

This paper develops toolsets for eliminating algorithm-based disclosure from existing privacy-preserving data publishing algorithms. We first show that the space of algorithm-based disclosure is larger than previously believed and thus more prevalent and dangerous. Then, we formally define Algorithm-Safe Publishing (ASP) to model the threats from algorithm-based disclosure. To eliminate algorithm-based disclosure from existing data publishing algorithms, we propose two generic tools for revising their design: worst-case eligibility test and stratified pick-up. We demonstrate the effectiveness of our tools by using them to transform two popular existing l-diversity algorithms, Mondrian and Hilb, to SP-Mondrian and SP-Hilb which are algorithm-safe. We conduct extensive experiments to demonstrate the effectiveness of SP-Mondrian and SP-Hilb in terms of data utility and efficiency.

[1]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[2]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Yufei Tao,et al.  Preservation of proximity privacy in publishing numerical sensitive data , 2008, SIGMOD Conference.

[4]  Walid G. Aref,et al.  Casper*: Query processing for location services without compromising privacy , 2006, TODS.

[5]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[6]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[8]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[9]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[10]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[11]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.

[13]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Ashwin Machanavajjhala,et al.  Data Publishing against Realistic Adversaries , 2009, Proc. VLDB Endow..

[15]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[16]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[17]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[18]  D. DeWitt,et al.  K-Anonymization as Spatial Indexing: Toward Scalable and Incremental Anonymization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  David J. DeWitt,et al.  Workload-aware anonymization , 2006, KDD '06.

[20]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[21]  K. Liu,et al.  Towards identity anonymization on graphs , 2008, SIGMOD Conference.

[22]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[26]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[27]  Kyuseok Shim,et al.  Approximate algorithms for K-anonymity , 2007, SIGMOD '07.

[28]  Ninghui Li,et al.  Modeling and Integrating Background Knowledge in Data Anonymization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[29]  Sushil Jajodia,et al.  Information disclosure under realistic assumptions: privacy versus optimality , 2007, CCS '07.

[30]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[31]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[32]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[33]  Wenliang Du,et al.  Privacy-MaxEnt: integrating background knowledge in privacy quantification , 2008, SIGMOD Conference.

[34]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[35]  Yufei Tao,et al.  On Anti-Corruption Privacy Preserving Publication , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[36]  Qing Zhang,et al.  Distribution-based Microdata Anonymization , 2009, Proc. VLDB Endow..

[37]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[38]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[39]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[40]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[41]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.