On syntactic anonymity and differential privacy

Recently, there has been a growing debate over approaches for handling and analyzing private data. Research has identified issues with syntactic anonymity models. Differential privacy has been promoted as the answer to privacy-preserving data mining. We discuss here issues involved and criticisms of both approaches, and conclude that both have their place. We identify research directions that will enable greater access to data while improving privacy guarantees.

[1]  Alina Campan,et al.  Generating Microdata with P -Sensitive K -Anonymity Property , 2007, Secure Data Management.

[2]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[4]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[5]  Ting Yu,et al.  Empirical privacy and empirical utility of anonymized data , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[6]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[7]  Chris Clifton,et al.  A Guide to Differential Privacy Theory in Social Network Analysis , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[8]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[9]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[10]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[11]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[12]  Lior Rokach,et al.  Limiting disclosure of sensitive data in sequential releases of databases , 2012, Inf. Sci..

[13]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[14]  Tamir Tassa,et al.  Privacy by diversity in sequential releases of databases , 2015, Inf. Sci..

[15]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[16]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[17]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[18]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[20]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[21]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, FOCS.

[22]  Tamir Tassa,et al.  Efficient Anonymizations with Enhanced Utility , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[23]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[24]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[25]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[26]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[27]  Benjamin C. M. Fung,et al.  Anonymizing healthcare data: a case study on the blood transfusion service , 2009, KDD.

[28]  Benjamin C. M. Fung,et al.  Anonymizing sequential releases , 2006, KDD '06.

[29]  Ninghui Li,et al.  Minimizing minimality and maximizing utility , 2010, Proc. VLDB Endow..

[30]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[31]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[32]  Jean-Pierre Corriveau,et al.  A globally optimal k-anonymity method for the de-identification of health data. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[33]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[34]  Ashwin Machanavajjhala,et al.  A rigorous and customizable framework for privacy , 2012, PODS.

[35]  Rathindra Sarathy,et al.  Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data , 2011, Trans. Data Priv..

[36]  Haim Kaplan,et al.  Private coresets , 2009, STOC '09.

[37]  Philip S. Yu,et al.  Anonymizing Classification Data for Privacy Preservation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[38]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[39]  Chun Yuan,et al.  Differentially Private Data Release through Multidimensional Partitioning , 2010, Secure Data Management.

[40]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[41]  Li Xiong,et al.  Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers , 2009, DBSec.

[42]  H. Humphrey,et al.  Standards for privacy of individually identifiable health information. , 2003, Health care law monthly.

[43]  Jianneng Cao,et al.  Publishing Microdata with a Robust Privacy Guarantee , 2012, Proc. VLDB Endow..

[44]  Daniel Kifer,et al.  Attacks on privacy and deFinetti's theorem , 2009, SIGMOD Conference.

[45]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[46]  Nikos Mamoulis,et al.  Non-homogeneous generalization in privacy preserving data publishing , 2010, SIGMOD Conference.

[47]  Bradley Malin,et al.  Trail re-identification and unlinkability in distributed databases , 2006 .

[48]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[49]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.

[50]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[51]  Graham Cormode,et al.  Personal privacy vs population privacy: learning to attack anonymization , 2011, KDD.

[52]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[53]  Joseph Y. Halpern,et al.  From Statistics to Beliefs , 1992, AAAI.

[54]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[55]  Shai Halevi,et al.  Theory of Cryptography - 8th Theory of Cryptography Conference, TCC 2011, Providence, RI, USA, March 28-30, 2011. Proceedings , 2011, Theory of Cryptography Conference.

[56]  Herbert Burkert,et al.  Some Preliminary Comments on the DIRECTIVE 95/46/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. , 1996 .

[57]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[58]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[59]  Hhs Office for Civil Rights Standards for privacy of individually identifiable health information. Final rule. , 2002, Federal register.

[60]  Sushil Jajodia,et al.  Cardinality-based inference control in data cubes , 2004, J. Comput. Secur..

[61]  Richard J. Lipton,et al.  Secure databases: protection against user influence , 1979, TODS.

[62]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[63]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[64]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[65]  David J. DeWitt,et al.  Workload-aware anonymization techniques for large-scale datasets , 2008, TODS.

[66]  Raymond Chi-Wing Wong,et al.  Minimality Attack in Privacy Preserving Data Publishing , 2007, VLDB.

[67]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[68]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[69]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[70]  Ehud Gudes,et al.  Secure distributed computation of anonymized views of shared databases , 2012, TODS.

[71]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[72]  Tamir Tassa,et al.  Improving accuracy of classification models induced from anonymized datasets , 2014, Inf. Sci..

[73]  Ashwin Machanavajjhala,et al.  Publishing Search Logs—A Comparative Study of Privacy Guarantees , 2012, IEEE Transactions on Knowledge and Data Engineering.

[74]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[75]  Qi Wang,et al.  Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[76]  Johannes Gehrke,et al.  Towards Privacy for Social Networks: A Zero-Knowledge Based Definition of Privacy , 2011, TCC.

[77]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[78]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[79]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[80]  Ninghui Li,et al.  On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy , 2011, ASIACCS '12.

[81]  Sheng Zhong,et al.  Privacy-enhancing k-anonymization of customer data , 2005, PODS.

[82]  Leslie Burnett,et al.  The "GeneTrustee": a universal identification system that ensures privacy and confidentiality for human genetic databases. , 2003, Journal of law and medicine.

[83]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[84]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[85]  Tamir Tassa,et al.  k-Concealment: An Alternative Model of k-Type Anonymity , 2012, Trans. Data Priv..

[86]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[87]  Tamir Tassa,et al.  k-Anonymization Revisited , 2008, 2008 IEEE 24th International Conference on Data Engineering.