Differentially Private SQL with Bounded User Contribution

Abstract Differential privacy (DP) provides formal guarantees that the output of a database query does not reveal too much information about any individual present in the database. While many differentially private algorithms have been proposed in the scientific literature, there are only a few end-to-end implementations of differentially private query engines. Crucially, existing systems assume that each individual is associated with at most one database record, which is unrealistic in practice. We propose a generic and scalable method to perform differentially private aggregations on databases, even when individuals can each be associated with arbitrarily many rows. We express this method as an operator in relational algebra, and implement it in an SQL engine. To validate this system, we test the utility of typical queries on industry benchmarks, and verify its correctness with a stochastic test framework we developed. We highlight the promises and pitfalls learned when deploying such a system in practice, and we publish its core components as open-source software.

[1]  J. Halton,et al.  Algorithm 247: Radical-inverse quasi-random point sequence , 1964, CACM.

[2]  Marilyn Bohl,et al.  Information processing , 1971 .

[3]  A. M. Lister,et al.  Fundamentals of Operating Systems , 1979, Springer New York.

[4]  Jean-Raymond Abrial,et al.  On B , 1998, B.

[5]  U. Chatterjee,et al.  Effect of unconventional feeds on production cost, growth performance and expression of quantitative genes in growing pigs , 2022, Journal of the Indonesian Tropical Animal Agriculture.

[6]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[7]  K. Malarz,et al.  Square lattice site percolation thresholds for complex neighbourhoods , 2006, cond-mat/0609635.

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Richard M. Karp,et al.  Noisy binary search and its applications , 2007, SODA '07.

[10]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[11]  J. Caballero,et al.  Albus 1: A Very Bright White Dwarf Candidate , 2007, 0707.1343.

[12]  Cynthia Dwork,et al.  An Ad Omnia Approach to Defining and Achieving Private Data Analysis , 2007, PinKDD.

[13]  Avinatan Hassidim,et al.  The Bayesian Learner is Optimal for Noisy Binary Search  (and Pretty Good for Quantum as Well) , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[15]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[16]  Cynthia Dwork,et al.  The Differential Privacy Frontier (Extended Abstract) , 2009, TCC.

[17]  Moni Naor,et al.  Pan-Private Streaming Algorithms , 2010, ICS.

[18]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[19]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[20]  Ashwin Machanavajjhala,et al.  Publishing Search Logs—A Comparative Study of Privacy Guarantees , 2012, IEEE Transactions on Knowledge and Data Engineering.

[21]  Andreas Haeberlen,et al.  DJoin: differentially private join queries over distributed databases , 2012, OSDI 2012.

[22]  Ilya Mironov,et al.  On significance of the least significant bits for differential privacy , 2012, CCS.

[23]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[24]  Andreas Haeberlen,et al.  Differential Privacy: An Economic Method for Choosing Epsilon , 2014, 2014 IEEE 27th Computer Security Foundations Symposium.

[25]  Yue Wang,et al.  A Data- and Workload-Aware Query Answering Algorithm for Range Queries Under Differential Privacy , 2014, Proc. VLDB Endow..

[26]  Pramod Viswanath,et al.  The optimal mechanism in differential privacy , 2012, 2014 IEEE International Symposium on Information Theory.

[27]  Giuseppe D'Acquisto,et al.  Differential Privacy: An Estimation Theory-Based Method for Choosing Epsilon , 2015, ArXiv.

[28]  Ninghui Li,et al.  Differential Privacy: From Theory to Practice , 2016, Differential Privacy.

[29]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[30]  Paul Francis,et al.  Diffix: High-Utility Database Anonymization , 2017, APF.

[31]  Carl A. Gunter,et al.  Plausible Deniability for Privacy-Preserving Data Synthesis , 2017, Proc. VLDB Endow..

[32]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[33]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[34]  Ashwin Machanavajjhala,et al.  Shrinkwrap: Differentially-Private Query Processing in Private Data Federations , 2018, arXiv.org.

[35]  Joseph P. Near,et al.  Towards Practical Differential Privacy for SQL Queries , 2018, Proc. VLDB Endow..

[36]  Thomas Steinke,et al.  Differential Privacy: A Primer for a Non-Technical Audience , 2018 .

[37]  Shrinkwrap , 2018, Proceedings of the VLDB Endowment.

[38]  Timon Gehr,et al.  DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization , 2018, CCS.

[39]  Esfandiar Mohammadi,et al.  Tight on Budget?: Tight Bounds for r-Fold Approximate Differential Privacy , 2018, CCS.

[40]  Danfeng Zhang,et al.  Detecting Violations of Differential Privacy , 2018, CCS.

[41]  Sergei Vassilvitskii,et al.  Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy , 2019, ICML.

[42]  Ashwin Machanavajjhala,et al.  PrivateSQL , 2019, Proceedings of the VLDB Endowment.

[43]  Ashwin Machanavajjhala,et al.  PrivateSQL: A Differentially Private SQL Query Engine , 2019, Proc. VLDB Endow..

[44]  Ashwin Machanavajjhala,et al.  Architecting a Differentially Private SQL Engine , 2019, CIDR.

[45]  Sara Krehbiel,et al.  Choosing Epsilon for Privacy as a Service , 2019, Proc. Priv. Enhancing Technol..

[46]  Damien Desfontaines,et al.  SoK: Differential privacies , 2019, Proc. Priv. Enhancing Technol..

[47]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[48]  P. Alam ‘U’ , 2021, Composites Engineering: An A–Z Guide.

[49]  C. Chree The times of , 1925 .