论文信息 - Privately Learning Mixtures of Axis-Aligned Gaussians - 字舞流文

Privately Learning Mixtures of Axis-Aligned Gaussians

We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We prove that Õ(kd log(1/δ)/αε) samples are sufficient to learn a mixture of k axis-aligned Gaussians in R to within total variation distance α while satisfying (ε, δ)-differential privacy. This is the first result for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If the covariance matrices of each of the Gaussians is the identity matrix, we show that Õ(kd/α + kd log(1/δ)/αε) samples are sufficient. Recently, the “local covering” technique of Bun, Kamath, Steinke, and Wu [BKSW19] has been successfully used for privately learning high-dimensional Gaussians with a known covariance matrix and extended to privately learning general high-dimensional Gaussians by Aden-Ali, Ashtiani, and Kamath [AAK21]. Given these positive results, this approach has been proposed as a promising direction for privately learning mixtures of Gaussians. Unfortunately, we show that this is not possible. We design a new technique for privately learning mixture distributions. A class of distributions F is said to be list-decodable if there is an algorithm that, given “heavily corrupted” samples from f ∈ F , outputs a list of distributions, F̂ , such that one of the distributions in F̂ approximates f . We show that if F is privately list-decodable, then we can privately learn mixtures of distributions in F . Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.

Hassan Ashtiani | Christopher Liaw | Ishaq Aden-Ali | Christopher Liaw | H. Ashtiani | Ishaq Aden-Ali

[1] Guy N. Rothblum,et al. Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[2] Kobbi Nissim,et al. Simultaneous Private Learning of Multiple Concepts , 2015, ITCS.

[3] Luc Devroye,et al. Combinatorial methods in density estimation , 2001, Springer series in statistics.

[4] Thomas Steinke,et al. Between Pure and Approximate Differential Privacy , 2015, J. Priv. Confidentiality.

[5] Huanyu Zhang,et al. Differentially Private Testing of Identity and Closeness of Discrete Distributions , 2017, NeurIPS.

[6] Kunal Talwar,et al. On the geometry of differential privacy , 2009, STOC '10.

[7] Vishesh Karwa,et al. Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[8] Thomas Steinke,et al. Tight Lower Bounds for Differentially Private Selection , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[9] Prasad Raghavendra,et al. List Decodable Subspace Recovery , 2020, COLT.

[10] Cynthia Dwork,et al. Differential privacy and robust statistics , 2009, STOC '09.

[11] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[12] Feng Ruan,et al. The Right Complexity Measure in Locally Private Estimation: It is not the Fisher Information , 2018, ArXiv.

[13] Daniel M. Kane,et al. List-Decodable Mean Estimation via Iterative Multi-Filtering , 2020, NeurIPS.

[14] Jonathan Ullman,et al. Private Identity Testing for High-Dimensional Distributions , 2019, NeurIPS.

[15] K. Pearson. Contributions to the Mathematical Theory of Evolution , 1894 .

[16] Hassan Ashtiani,et al. On the Sample Complexity of Privately Learning Unbounded High-Dimensional Gaussians , 2020, ALT.

[17] Santosh S. Vempala,et al. A discriminative framework for clustering via similarity functions , 2008, STOC.

[18] Thomas Steinke,et al. Make Up Your Mind: The Price of Online Queries in Differential Privacy , 2016, SODA.

[19] Jonathan Ullman,et al. Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[20] Daniel M. Kane,et al. List-decodable robust mean estimation and learning mixtures of spherical gaussians , 2017, STOC.

[21] Jonathan Ullman,et al. Private Mean Estimation of Heavy-Tailed Distributions , 2020, COLT.

[22] Huanyu Zhang,et al. Differentially Private Assouad, Fano, and Le Cam , 2020, ALT.

[23] Thomas Steinke,et al. Private Hypothesis Selection , 2019, IEEE Transactions on Information Theory.

[24] Jonathan Ullman,et al. CoinPress: Practical Private Mean and Covariance Estimation , 2020, NeurIPS.

[25] Moni Naor,et al. Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[26] Weihao Kong,et al. Robust and Differentially Private Mean Estimation , 2021, NeurIPS.

[27] Andrew Bray,et al. Differentially Private Confidence Intervals , 2020, ArXiv.

[28] R. Reiss. Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics , 1989 .

[29] Thomas Steinke,et al. Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation , 2019, NeurIPS.

[30] Sidhanth Mohanty,et al. List Decodable Mean Estimation in Nearly Linear Time , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[31] Janardhan Kulkarni,et al. Privately Learning Markov Random Fields , 2020, ICML.

[32] Chunming Qiao,et al. Mutual Information Optimally Local Private Discrete Distribution Estimation , 2016, ArXiv.

[33] Constantinos Daskalakis,et al. Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[34] Alon Orlitsky,et al. Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures , 2014, NIPS.

[35] Ainesh Bakshi,et al. List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time , 2020, SODA.

[36] Janardhan Kulkarni,et al. Collecting Telemetry Data Privately , 2017, NIPS.

[37] Prasad Raghavendra,et al. List Decodable Learning via Sum of Squares , 2019, SODA.

[38] Sofya Raskhodnikova,et al. Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[39] Úlfar Erlingsson,et al. Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[40] Jonathan Ullman,et al. Differentially Private Algorithms for Learning Mixtures of Separated Gaussians , 2019, 2020 Information Theory and Applications Workshop (ITA).

[41] Martin J. Wainwright,et al. Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[42] Adam D. Smith,et al. Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[43] John Duchi,et al. Lower Bounds for Locally Private Estimation via Communication Complexity , 2019, COLT.

[44] Maria-Florina Balcan,et al. Agnostic Clustering , 2009, ALT.

[45] Thomas Steinke,et al. Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[46] John C. Duchi,et al. Privacy and Statistical Risk: Formalisms and Minimax Bounds , 2014, ArXiv.

[47] Nina Mishra,et al. Releasing search queries and clicks privately , 2009, WWW '09.

[48] Jonathan Ullman,et al. A Primer on Private Statistics , 2020, ArXiv.

[49] Roman Vershynin,et al. High-Dimensional Probability , 2018 .

[50] Shai Ben-David,et al. Sample-Efficient Learning of Mixtures , 2017, AAAI.

[51] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[52] Nicholas J. A. Harvey,et al. Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes , 2017, J. ACM.

[53] Huanyu Zhang,et al. Hadamard Response: Estimating Distributions Privately, Efficiently, and with Little Communication , 2018, AISTATS.

[54] Yichen Wang,et al. The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy , 2019, The Annals of Statistics.

[55] Gregory Valiant,et al. Learning from untrusted data , 2016, STOC.

[56] Adam R. Klivans,et al. List-Decodable Linear Regression , 2019, NeurIPS.

[57] Ilias Diakonikolas,et al. Differentially Private Learning of Structured Discrete Distributions , 2015, NIPS.

[58] Peter Kairouz,et al. Discrete Distribution Estimation under Local Privacy , 2016, ICML.

[59] Janardhan Kulkarni,et al. Locally Private Gaussian Estimation , 2018, NeurIPS.

[60] Úlfar Erlingsson,et al. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.