Privately Learning Mixtures of Axis-Aligned Gaussians

We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We prove that Õ(kd log(1/δ)/αε) samples are sufficient to learn a mixture of k axis-aligned Gaussians in R to within total variation distance α while satisfying (ε, δ)-differential privacy. This is the first result for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If the covariance matrices of each of the Gaussians is the identity matrix, we show that Õ(kd/α + kd log(1/δ)/αε) samples are sufficient. Recently, the “local covering” technique of Bun, Kamath, Steinke, and Wu [BKSW19] has been successfully used for privately learning high-dimensional Gaussians with a known covariance matrix and extended to privately learning general high-dimensional Gaussians by Aden-Ali, Ashtiani, and Kamath [AAK21]. Given these positive results, this approach has been proposed as a promising direction for privately learning mixtures of Gaussians. Unfortunately, we show that this is not possible. We design a new technique for privately learning mixture distributions. A class of distributions F is said to be list-decodable if there is an algorithm that, given “heavily corrupted” samples from f ∈ F , outputs a list of distributions, F̂ , such that one of the distributions in F̂ approximates f . We show that if F is privately list-decodable, then we can privately learn mixtures of distributions in F . Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.

[1]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[2]  Kobbi Nissim,et al.  Simultaneous Private Learning of Multiple Concepts , 2015, ITCS.

[3]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[4]  Thomas Steinke,et al.  Between Pure and Approximate Differential Privacy , 2015, J. Priv. Confidentiality.

[5]  Huanyu Zhang,et al.  Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[6]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[7]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[8]  Thomas Steinke,et al.  Tight Lower Bounds for Differentially Private Selection , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[9]  Prasad Raghavendra,et al.  List Decodable Subspace Recovery , 2020, COLT.

[10]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[11]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[12]  Feng Ruan,et al.  The Right Complexity Measure in Locally Private Estimation: It is not the Fisher Information , 2018, ArXiv.

[13]  Daniel M. Kane,et al.  List-Decodable Mean Estimation via Iterative Multi-Filtering , 2020, NeurIPS.

[14]  Jonathan Ullman,et al.  Private Identity Testing for High-Dimensional Distributions , 2019, NeurIPS.

[15]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[16]  Hassan Ashtiani,et al.  On the Sample Complexity of Privately Learning Unbounded High-Dimensional Gaussians , 2020, ALT.

[17]  Santosh S. Vempala,et al.  A discriminative framework for clustering via similarity functions , 2008, STOC.

[18]  Thomas Steinke,et al.  Make Up Your Mind: The Price of Online Queries in Differential Privacy , 2016, SODA.

[19]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[20]  Daniel M. Kane,et al.  List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[21]  Jonathan Ullman,et al.  Private Mean Estimation of Heavy-Tailed Distributions , 2020, COLT.

[22]  Huanyu Zhang,et al.  Differentially Private Assouad, Fano, and Le Cam , 2020, ALT.

[23]  Thomas Steinke,et al.  Private Hypothesis Selection , 2019, IEEE Transactions on Information Theory.

[24]  Jonathan Ullman,et al.  CoinPress: Practical Private Mean and Covariance Estimation , 2020, NeurIPS.

[25]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[26]  Weihao Kong,et al.  Robust and Differentially Private Mean Estimation , 2021, NeurIPS.

[27]  Andrew Bray,et al.  Differentially Private Confidence Intervals , 2020, ArXiv.

[28]  R. Reiss Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics , 1989 .

[29]  Thomas Steinke,et al.  Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation , 2019, NeurIPS.

[30]  Sidhanth Mohanty,et al.  List Decodable Mean Estimation in Nearly Linear Time , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[31]  Janardhan Kulkarni,et al.  Privately Learning Markov Random Fields , 2020, ICML.

[32]  Chunming Qiao,et al.  Mutual Information Optimally Local Private Discrete Distribution Estimation , 2016, ArXiv.

[33]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[34]  Alon Orlitsky,et al.  Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[35]  Ainesh Bakshi,et al.  List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time , 2020, SODA.

[36]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[37]  Prasad Raghavendra,et al.  List Decodable Learning via Sum of Squares , 2019, SODA.

[38]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[39]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[40]  Jonathan Ullman,et al.  Differentially Private Algorithms for Learning Mixtures of Separated Gaussians , 2019, 2020 Information Theory and Applications Workshop (ITA).

[41]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[42]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[43]  John Duchi,et al.  Lower Bounds for Locally Private Estimation via Communication Complexity , 2019, COLT.

[44]  Maria-Florina Balcan,et al.  Agnostic Clustering , 2009, ALT.

[45]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[46]  John C. Duchi,et al.  Privacy and Statistical Risk: Formalisms and Minimax Bounds , 2014, ArXiv.

[47]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[48]  Jonathan Ullman,et al.  A Primer on Private Statistics , 2020, ArXiv.

[49]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[50]  Shai Ben-David,et al.  Sample-Efficient Learning of Mixtures , 2017, AAAI.

[51]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[52]  Nicholas J. A. Harvey,et al.  Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes , 2017, J. ACM.

[53]  Huanyu Zhang,et al.  Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[54]  Yichen Wang,et al.  The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy , 2019, The Annals of Statistics.

[55]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[56]  Adam R. Klivans,et al.  List-Decodable Linear Regression , 2019, NeurIPS.

[57]  Ilias Diakonikolas,et al.  Differentially Private Learning of Structured Discrete Distributions , 2015, NIPS.

[58]  Peter Kairouz,et al.  Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[59]  Janardhan Kulkarni,et al.  Locally Private Gaussian Estimation , 2018, NeurIPS.

[60]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.