Private Mean Estimation of Heavy-Tailed Distributions

We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded $k$-th moments. Roughly speaking, in the univariate case, we show that $n = \Theta\left(\frac{1}{\alpha^2} + \frac{1}{\alpha^{\frac{k}{k-1}}\varepsilon}\right)$ samples are necessary and sufficient to estimate the mean to $\alpha$-accuracy under $\varepsilon$-differential privacy, or any of its common relaxations. This result demonstrates a qualitatively different behavior compared to estimation absent privacy constraints, for which the sample complexity is identical for all $k \geq 2$. We also give algorithms for the multivariate setting whose sample complexity is a factor of $O(d)$ larger than the univariate case.

[1]  Kobbi Nissim,et al.  Simultaneous Private Learning of Multiple Concepts , 2015, ITCS.

[2]  G. Lugosi,et al.  A universally acceptable smoothing factor for kernel density estimates , 1996 .

[3]  Alon Orlitsky,et al.  Maximum Selection and Sorting with Adversarial Comparators , 2018, J. Mach. Learn. Res..

[4]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[5]  Jerry Li,et al.  Privately Learning High-Dimensional Distributions , 2018, COLT.

[6]  Thomas Steinke,et al.  Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation , 2019, NeurIPS.

[7]  Jonathan Ullman,et al.  Differentially Private Algorithms for Learning Mixtures of Separated Gaussians , 2019, 2020 Information Theory and Applications Workshop (ITA).

[8]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[9]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[10]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[11]  Huanyu Zhang,et al.  INSPECTRE: Privately Estimating the Unseen , 2018, ICML.

[12]  Jonathan Ullman,et al.  CoinPress: Practical Private Mean and Covariance Estimation , 2020, NeurIPS.

[13]  G. Lugosi,et al.  Nonasymptotic universal smoothing factors, kernel complexity and yatracos classes , 1997 .

[14]  Kobbi Nissim,et al.  Differentially Private Release and Learning of Threshold Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[15]  Peter L. Bartlett,et al.  Fast Mean Estimation with Sub-Gaussian Rates , 2019, COLT.

[16]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[17]  Thomas Steinke,et al.  Between Pure and Approximate Differential Privacy , 2015, J. Priv. Confidentiality.

[18]  Samuel B. Hopkins Mean estimation with sub-Gaussian rates in polynomial time , 2018, The Annals of Statistics.

[19]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[21]  Yichen Wang,et al.  The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy , 2019, The Annals of Statistics.

[22]  Thomas Steinke,et al.  Interactive fingerprinting codes and the hardness of preventing false discovery , 2014, 2016 Information Theory and Applications Workshop (ITA).

[23]  Alon Orlitsky,et al.  Sorting with adversarial comparators and application to density estimation , 2014, 2014 IEEE International Symposium on Information Theory.

[24]  Janardhan Kulkarni,et al.  Locally Private Gaussian Estimation , 2018, NeurIPS.

[25]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[26]  Shay Moran,et al.  The Optimal Approximation Factor in Density Estimation , 2019, COLT.

[27]  Salil P. Vadhan,et al.  Differential Privacy on Finite Computers , 2017, ITCS.

[28]  Alon Orlitsky,et al.  Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[29]  Banghua Zhu,et al.  Generalized Resilience and Robust Statistics , 2019, The Annals of Statistics.

[30]  Andrew Bray,et al.  Differentially Private Confidence Intervals , 2020, ArXiv.

[31]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[32]  Shahar Mendelson,et al.  Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey , 2019, Found. Comput. Math..

[33]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[34]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[35]  Victor-Emmanuel Brunel,et al.  Differentially private sub-Gaussian location estimators , 2019, 1906.11923.

[36]  Daniel Stefankovic,et al.  Density Estimation in Linear Time , 2007, COLT.

[37]  Huanyu Zhang,et al.  Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[38]  G. Lecu'e,et al.  Robust sub-Gaussian estimation of a mean vector in nearly linear time , 2019, The Annals of Statistics.

[39]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[40]  Ilias Diakonikolas,et al.  Differentially Private Learning of Structured Discrete Distributions , 2015, NIPS.

[41]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[42]  Jonathan Ullman,et al.  A Primer on Private Statistics , 2020, ArXiv.

[43]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[44]  Samuel B. Hopkins Sub-Gaussian Mean Estimation in Polynomial Time , 2018, ArXiv.

[45]  G. Lugosi,et al.  Sub-Gaussian estimators of the mean of a random vector , 2017, The Annals of Statistics.

[46]  Thomas Steinke,et al.  Private Hypothesis Selection , 2019, IEEE Transactions on Information Theory.

[47]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[48]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[49]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[50]  Y. Yatracos Rates of Convergence of Minimum Distance Estimators and Kolmogorov's Entropy , 1985 .

[51]  Rocco A. Servedio,et al.  Learning Poisson Binomial Distributions , 2011, STOC '12.

[52]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[53]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[54]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.