A rigorous and customizable framework for privacy

In this paper we introduce a new and general privacy framework called Pufferfish. The Pufferfish framework can be used to create new privacy definitions that are customized to the needs of a given application. The goal of Pufferfish is to allow experts in an application domain, who frequently do not have expertise in privacy, to develop rigorous privacy definitions for their data sharing needs. In addition to this, the Pufferfish framework can also be used to study existing privacy definitions. We illustrate the benefits with several applications of this privacy framework: we use it to formalize and prove the statement that differential privacy assumes independence between records, we use it to define and study the notion of composition in a broader context than before, we show how to apply it to protect unbounded continuous attributes and aggregate information, and we show how to use it to rigorously account for prior data releases.

[1]  Osmar R. Zaïane,et al.  Algorithms for balancing privacy and knowledge discovery in association rule mining , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[2]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[3]  Bing-Rong Lin,et al.  Towards an axiomatization of statistical privacy and utility , 2010, PODS.

[4]  Dan Suciu,et al.  Relationship privacy: output perturbation for queries with joins , 2009, PODS.

[5]  Philip S. Yu,et al.  Template-based privacy preservation in classification problems , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[7]  Vassilios S. Verykios,et al.  A data perturbation approach to sensitive classification rule hiding , 2010, SAC '10.

[8]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[9]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[10]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[11]  Yitao Duan Privacy without noise , 2009, CIKM.

[12]  Bing-Rong Lin,et al.  An Axiomatic View of Statistical Privacy and Utility , 2012, J. Priv. Confidentiality.

[13]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[14]  Chris Clifton,et al.  Defining Privacy for Data Mining , 2002 .

[15]  Moni Naor,et al.  On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy , 2010, J. Priv. Confidentiality.

[16]  Guanling Lee,et al.  An efficient sanitization algorithm for balancing information privacy and knowledge discovery in association patterns mining , 2008, Data Knowl. Eng..

[17]  George V. Moustakides,et al.  A MaxMin approach for hiding frequent itemsets , 2008, Data Knowl. Eng..

[18]  Stephen E. Fienberg,et al.  Confidentiality and Disclosure Limitation , 2005 .

[19]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[20]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[21]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[22]  Larry A. Wasserman,et al.  Differential privacy with compression , 2009, 2009 IEEE International Symposium on Information Theory.

[23]  Kamalika Chaudhuri,et al.  When Random Sampling Preserves Privacy , 2006, CRYPTO.

[24]  Charu C. Aggarwal,et al.  On privacy preservation against adversarial data mining , 2006, KDD '06.

[25]  Chris Clifton,et al.  Using Sample Size to Limit Exposure to Data Mining , 2000, J. Comput. Secur..

[26]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[27]  Omer Reingold,et al.  Computational Differential Privacy , 2009, CRYPTO.

[28]  Patrick J. Cantwell,et al.  The Use of Statistical Methods in the U.S. Census , 2004 .

[29]  Ashwin Machanavajjhala,et al.  Privacy-Preserving Data Publishing , 2009, Found. Trends Databases.

[30]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[31]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[32]  Johannes Gehrke,et al.  Towards Privacy for Social Networks: A Zero-Knowledge Based Definition of Privacy , 2011, TCC.

[33]  Elisa Bertino,et al.  Association rule hiding , 2004, IEEE Transactions on Knowledge and Data Engineering.

[34]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[35]  Chris Clifton,et al.  SECURITY AND PRIVACY IMPLICATIONS OF DATA MINING , 1996 .

[36]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[37]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[38]  Bing-Rong Lin,et al.  A Framework for Extracting Semantic Guarantees from Privacy , 2012, ArXiv.

[39]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[40]  Maria E. Orlowska,et al.  A reconstruction-based algorithm for classification rules hiding , 2006, ADC.