Distance makes the types grow stronger: a calculus for differential privacy

We want assurances that sensitive information will not be disclosed when aggregate data derived from a database is published. Differential privacy offers a strong statistical guarantee that the effect of the presence of any individual in a database will be negligible, even when an adversary has auxiliary knowledge. Much of the prior work in this area consists of proving algorithms to be differentially private one at a time; we propose to streamline this process with a functional language whose type system automatically guarantees differential privacy, allowing the programmer to write complex privacy-safe query programs in a flexible and compositional way. The key novelty is the way our type system captures function sensitivity, a measure of how much a function can magnify the distance between similar inputs: well-typed programs not only can't go wrong, they can't go too far on nearby inputs. Moreover, by introducing a monad for random computations, we can show that the established definition of differential privacy falls out naturally as a special case of this soundness principle. We develop examples including known differentially private algorithms, privacy-aware variants of standard functional programming idioms, and compositionality principles for differential privacy.

[1]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[2]  Sumit Gulwani,et al.  Continuity analysis of programs , 2010, POPL '10.

[3]  Tim Roughgarden,et al.  The Median Mechanism: Interactive and Efficient Privacy with Multiple Queries , 2009, ArXiv.

[4]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.

[5]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[6]  Aaron Roth,et al.  Differentially private combinatorial optimization , 2009, SODA '10.

[7]  Cynthia Dwork,et al.  The Differential Privacy Frontier (Extended Abstract) , 2009, TCC.

[8]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[9]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[10]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[11]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[12]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Frank McSherry,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[14]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[15]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[16]  Amal Ahmed,et al.  Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types , 2006, ESOP.

[17]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[18]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[19]  Gavin Lowe,et al.  Quantifying information flow , 2002, Proceedings 15th IEEE Computer Security Foundations Workshop. CSFW-15.

[20]  Norman Ramsey,et al.  Stochastic lambda calculus and monads of probability distributions , 2002, POPL '02.

[21]  Andrew W. Appel,et al.  An indexed model of recursive types for foundational proof-carrying code , 2001, TOPL.

[22]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[23]  Peter W. O'Hearn,et al.  The Logic of Bunched Implications , 1999, Bulletin of Symbolic Logic.

[24]  Clement A. Baker-Finch,et al.  Usage Analysis with Natural Reduction Types , 1993, WSA.

[25]  Patrick Lincoln,et al.  Linear logic , 1992, SIGA.

[26]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[27]  J. Lambek The Mathematics of Sentence Structure , 1958 .

[28]  S. Thrun,et al.  A Monadic Probabilistic Language , 2003 .

[29]  Phokion G. Kolaitis,et al.  Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , 2002 .

[30]  Andrew Barber,et al.  Dual Intuitionistic Linear Logic , 1996 .

[31]  Richard W. Weyhrauch,et al.  A Decidable Fragment of Predicate Calculus , 1984, Theor. Comput. Sci..

[32]  János Komlós,et al.  Sorting in c log n parallel sets , 1983, Comb..