Efficient Mean Estimation with Pure Differential Privacy via a Sum-of-Squares Exponential Mechanism

We give the first polynomial-time algorithm to estimate the mean of a d-variate probability distribution with bounded covariance from Õ(d) independent samples subject to pure differential privacy. Prior algorithms for this problem either incur exponential running time, require Ω(d) samples, or satisfy only the weaker concentrated or approximate differential privacy conditions. In particular, all prior polynomial-time algorithms require d samples to guarantee small privacy loss with “cryptographically” high probability, 1− 2−dΩ(1) , while our algorithm retains Õ(d) sample complexity even in this stringent setting. Our main technique is a new approach to use the powerful Sum of Squares method (SoS) to design differentially private algorithms. SoS proofs to algorithms is a key theme in numerous recent works in high-dimensional algorithmic statistics – estimators which apparently require exponential running time but whose analysis can be captured by low-degree Sum of Squares proofs can be automatically turned into polynomial-time algorithms with the same provable guarantees. We demonstrate a similar proofs to private algorithms phenomenon: instances of the workhorse exponential mechanism which apparently require exponential time but which can be analyzed with low-degree SoS proofs can be automatically turned into polynomial-time differentially private algorithms. We prove a meta-theorem capturing this phenomenon, which we expect to be of broad use in private algorithm design. Our techniques also draw new connections between differentially private and robust statistics in high dimensions. In particular, viewed through our proofs-to-private-algorithms lens, several well-studied SoS proofs from recent works in algorithmic robust statistics directly yield key components of our differentially private mean estimation algorithm. Authors are in alphabetical order. UC Berkeley and MIT. samhop@mit.edu. Supported by a Miller Postdoctoral Fellowship and a Simons Postdoctoral Fellowship. Part of this work was conducted while visiting the Simons Institute for the Theory of Computing. Cheriton School of Computer Science, University of Waterloo. g@csail.mit.edu. Supported by an NSERC Discovery Grant and a University of Waterloo Startup grant. Cheriton School of Computer Science, University of Waterloo. m2majid@uwaterloo.ca. Supported by an NSERC Discovery Grant, a Graduate Excellence Award in Computer Science, a David R. Cheriton Graduate Scholarship, and a Waterloo CPI Cybersecurity and Privacy Excellence Graduate Scholarship.

[1]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[2]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[3]  Ke Yi,et al.  Instance-optimal Mean Estimation Under Differential Privacy , 2021, NeurIPS.

[4]  John C. Duchi,et al.  Privacy and Statistical Risk: Formalisms and Minimax Bounds , 2014, ArXiv.

[5]  Benjamin Weitz,et al.  Polynomial Proof Systems, Effective Derivations, and their Applications in the Sum-of-Squares Hierarchy , 2017 .

[6]  Peter L. Bartlett,et al.  Fast Mean Estimation with Sub-Gaussian Rates , 2019, COLT.

[7]  Prasad Raghavendra,et al.  The Power of Sum-of-Squares for Detecting Hidden Structures , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Hassan Ashtiani,et al.  Private and polynomial time algorithms for learning Gaussians and beyond , 2021, ArXiv.

[9]  Janardhan Kulkarni,et al.  Differentially Private Release of Synthetic Graphs , 2020, SODA.

[10]  G. Lecu'e,et al.  Robust sub-Gaussian estimation of a mean vector in nearly linear time , 2019, The Annals of Statistics.

[11]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[12]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[13]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[14]  Samuel B. Hopkins Mean estimation with sub-Gaussian rates in polynomial time , 2018, The Annals of Statistics.

[15]  Pravesh Kothari,et al.  Better Agnostic Clustering Via Relaxed Tensor Norms , 2017, ArXiv.

[16]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[17]  Kunal Talwar,et al.  On differentially private low rank approximation , 2013, SODA.

[18]  Pravesh Kothari,et al.  Outlier-robust moment-estimation via sum-of-squares , 2017, ArXiv.

[19]  Shahar Mendelson,et al.  Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey , 2019, Found. Comput. Math..

[20]  Jonathan Ullman,et al.  Private Mean Estimation of Heavy-Tailed Distributions , 2020, COLT.

[21]  Jonathan Ullman,et al.  A Primer on Private Statistics , 2020, ArXiv.

[22]  Jerry Li,et al.  Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[23]  Jonathan Ullman,et al.  CoinPress: Practical Private Mean and Covariance Estimation , 2020, NeurIPS.

[24]  Weihao Kong,et al.  Robust and Differentially Private Mean Estimation , 2021, NeurIPS.

[25]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[26]  Prasad Raghavendra,et al.  High-dimensional estimation via sum-of-squares proofs , 2018, Proceedings of the International Congress of Mathematicians (ICM 2018).

[27]  Christos Tzamos,et al.  Optimal Private Median Estimation under Minimal Distributional Assumptions , 2020, NeurIPS.

[28]  Thomas Steinke,et al.  Interactive fingerprinting codes and the hardness of preventing false discovery , 2014, 2016 Information Theory and Applications Workshop (ITA).

[29]  Ankit Pensia,et al.  Outlier Robust Mean Estimation with Subgaussian Rates via Stability , 2020, NeurIPS.

[30]  Sanjiv Kumar,et al.  Learning discrete distributions: user vs item-level privacy , 2020, NeurIPS.

[31]  David Steurer,et al.  Exact tensor completion with sum-of-squares , 2017, COLT.

[32]  Daniel M. Kane,et al.  Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[33]  Yichen Wang,et al.  The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy , 2019, The Annals of Statistics.

[34]  Samuel B. Hopkins,et al.  Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization , 2020, NeurIPS.

[35]  Ilias Diakonikolas,et al.  Differentially Private Learning of Structured Discrete Distributions , 2015, NIPS.

[36]  Avrim Blum,et al.  The Johnson-Lindenstrauss Transform Itself Preserves Differential Privacy , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[37]  Andrew Bray,et al.  Differentially Private Confidence Intervals , 2020, ArXiv.

[38]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[39]  Hassan Ashtiani,et al.  On the Sample Complexity of Privately Learning Unbounded High-Dimensional Gaussians , 2020, ALT.

[40]  Weihao Kong,et al.  Differential privacy and robust statistics in high dimensions , 2021, COLT.

[41]  Jinhui Xu,et al.  On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data , 2020, ICML.

[42]  Alex Kulesza,et al.  Learning with User-Level Privacy , 2021, NeurIPS.

[43]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[44]  Thomas Steinke,et al.  A Private and Computationally-Efficient Estimator for Unbounded Gaussians , 2021, ArXiv.

[45]  Thomas Steinke,et al.  Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation , 2019, NeurIPS.

[46]  Gautam Kamath,et al.  Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data , 2021, ArXiv.

[47]  C. Dwork,et al.  Exposed! A Survey of Attacks on Private Data , 2017, Annual Review of Statistics and Its Application.

[48]  Jonathan Ullman,et al.  Differentially Private Algorithms for Learning Mixtures of Separated Gaussians , 2019, 2020 Information Theory and Applications Workshop (ITA).

[49]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[50]  Marco Gaboardi,et al.  Covariance-Aware Private Mean Estimation Without Private Covariance Estimation , 2021, NeurIPS.

[51]  Hassan Ashtiani,et al.  Privately Learning Mixtures of Axis-Aligned Gaussians , 2021, NeurIPS.

[52]  Pradeep Ravikumar,et al.  A Unified Approach to Robust Mean Estimation , 2019, ArXiv.

[53]  Aleksandra Korolova,et al.  The Power of the Hybrid Model for Mean Estimation , 2018, Proc. Priv. Enhancing Technol..

[54]  Sergei Vassilvitskii,et al.  Differentially Private Covariance Estimation , 2019, NeurIPS.

[55]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[56]  Gregory Valiant,et al.  Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers , 2017, ITCS.

[57]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[58]  Sampling matrices from Harish-Chandra–Itzykson–Zuber densities with applications to Quantum inference and differential privacy , 2021, STOC.

[59]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[60]  Tim Roughgarden,et al.  Privately Solving Linear Programs , 2014, ICALP.

[61]  Nisheeth K. Vishnoi,et al.  Sampling from Log-Concave Distributions with Infinity-Distance Guarantees and Applications to Differentially Private Optimization , 2021, ArXiv.

[62]  G. Lugosi,et al.  Sub-Gaussian estimators of the mean of a random vector , 2017, The Annals of Statistics.

[63]  Janardhan Kulkarni,et al.  Privately Learning Markov Random Fields , 2020, ICML.

[64]  Jerry Li,et al.  Privately Learning High-Dimensional Distributions , 2018, COLT.

[65]  Thomas Steinke,et al.  Private Hypothesis Selection , 2019, IEEE Transactions on Information Theory.