A framework for adaptive differential privacy

Differential privacy is a widely studied theory for analyzing sensitive data with a strong privacy guarantee—any change in an individual's data can have only a small statistical effect on the result—and a growing number of programming languages now support differentially private data analysis. A common shortcoming of these languages is poor support for adaptivity. In practice, a data analyst rarely wants to run just one function over a sensitive database, nor even a predetermined sequence of functions with fixed privacy parameters; rather, she wants to engage in an interaction where, at each step, both the choice of the next function and its privacy parameters are informed by the results of prior functions. Existing languages support this scenario using a simple composition theorem, which often gives rather loose bounds on the actual privacy cost of composite functions, substantially reducing how much computation can be performed within a given privacy budget. The theory of differential privacy includes other theorems with much better bounds, but these have not yet been incorporated into programming languages. We propose a novel framework for adaptive composition that is elegant, practical, and implementable. It consists of a reformulation based on typed functional programming of privacy filters, together with a concrete realization of this framework in the design and implementation of a new language, called Adaptive Fuzz. Adaptive Fuzz transplants the core static type system of Fuzz to the adaptive setting by wrapping the Fuzz typechecker and runtime system in an outer adaptive layer, allowing Fuzz programs to be conveniently constructed and typechecked on the fly. We describe an interpreter for Adaptive Fuzz and report results from two case studies demonstrating its effectiveness for implementing common statistical algorithms over real data sets.

[1]  Ilya Mironov,et al.  On significance of the least significant bits for differential privacy , 2012, CCS.

[2]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[3]  Andreas Haeberlen,et al.  Linear dependent types for differential privacy , 2013, POPL.

[4]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[5]  Andreas Haeberlen,et al.  DJoin: differentially private join queries over distributed databases , 2012, OSDI 2012.

[6]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[7]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[8]  Gilles Barthe,et al.  Higher-Order Approximate Relational Refinement Types for Mechanism Design and Differential Privacy , 2014, POPL.

[9]  Shiva Prasad Kasiviswanathan,et al.  On the 'Semantics' of Differential Privacy: A Bayesian Formulation , 2008, J. Priv. Confidentiality.

[10]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[11]  Daniel Kifer,et al.  Attacks on privacy and deFinetti's theorem , 2009, SIGMOD Conference.

[12]  Benjamin C. Pierce,et al.  Distance makes the types grow stronger: a calculus for differential privacy , 2010, ICFP '10.

[13]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Philip Wadler,et al.  Linear Types can Change the World! , 1990, Programming Concepts and Methods.

[15]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[16]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[17]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[18]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[19]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[20]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[21]  David Sands,et al.  Featherweight PINQ , 2015, J. Priv. Confidentiality.

[22]  Aaron Roth,et al.  Privacy Odometers and Filters: Pay-as-you-Go Composition , 2016, NIPS.

[23]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[24]  Ryan J. Tibshirani,et al.  A general framework for fast stagewise algorithms , 2014, J. Mach. Learn. Res..

[25]  Andreas Haeberlen,et al.  Differential Privacy: An Economic Method for Choosing Epsilon , 2014, 2014 IEEE 27th Computer Security Foundations Symposium.

[26]  Gilles Barthe,et al.  Probabilistic Relational Reasoning for Differential Privacy , 2012, TOPL.

[27]  Sorin Lerner,et al.  On Subnormal Floating Point and Abnormal Timing , 2015, 2015 IEEE Symposium on Security and Privacy.

[28]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[29]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[30]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[31]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[32]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[33]  Salil P. Vadhan,et al.  The Complexity of Computing the Optimal Composition of Differential Privacy , 2015, IACR Cryptol. ePrint Arch..

[34]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[35]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[36]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[37]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[38]  Xi Chen,et al.  Statistical Decision Making for Optimal Budget Allocation in Crowd Labeling , 2014, J. Mach. Learn. Res..