Private Robust Estimation by Stabilizing Convex Relaxations

We give the first polynomial time and sample ( , )-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two wellstudied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifiable hypercontractivity of degree 2 polynomials. Our recovery guarantees hold in the “right affine-invariant norms”: Mahalanobis distance for mean, multiplicative spectral and relative Frobenius distance guarantees for covariance and injective norms for higher moments. Prior works obtained private robust algorithms for mean estimation of subgaussian distributions with bounded covariance. For covariance estimation, ours is the first efficient algorithm (even in the absence of outliers) that succeeds without any condition-number assumptions. Our algorithms arise from a new framework that provides a general blueprint for modifying convex relaxations for robust estimation to satisfy strong worst-case stability guarantees in the appropriate parameter norms whenever the algorithms produce witnesses of correctness in their run. We verify such guarantees for a modification of standard sum-of-squares (SoS) semidefinite programming relaxations for robust estimation. Our privacy guarantees are obtained by combining stability guarantees with a new “estimate dependent” noise injection mechanism in which noise scales with the eigenvalues of the estimated covariance. We believe this framework will be useful more generally in obtaining DP counterparts of robust estimators. Independently of our work, Ashtiani and Liaw [AL21] also obtained a polynomial time and sample private robust estimation algorithm for Gaussian distributions. Carnegie Mellon University Google Research

[1]  Pravesh Kothari,et al.  Robust moment estimation and improved clustering via sum of squares , 2018, STOC.

[2]  J. Lasserre New Positive Semidefinite Relaxations for Nonconvex Quadratic Programs , 2001 .

[3]  Weihao Kong,et al.  Robust Meta-learning for Mixed Linear Regression with Small Batches , 2020, NeurIPS.

[4]  Didier Henrion,et al.  Strong duality in Lasserre’s hierarchy for polynomial optimization , 2014, Optim. Lett..

[5]  Samuel B. Hopkins,et al.  Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection , 2019, NeurIPS.

[6]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[7]  Hassan Ashtiani,et al.  On the Sample Complexity of Privately Learning Unbounded High-Dimensional Gaussians , 2020, ALT.

[8]  Jerry Li,et al.  Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time , 2020, NeurIPS.

[9]  Marco Gaboardi,et al.  Covariance-Aware Private Mean Estimation Without Private Covariance Estimation , 2021, NeurIPS.

[10]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[11]  Vishesh Karwa,et al.  Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[12]  Hassan Ashtiani,et al.  Private and polynomial time algorithms for learning Gaussians and beyond , 2021, ArXiv.

[13]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[16]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[17]  C. Dwork,et al.  Exposed! A Survey of Attacks on Private Data , 2017, Annual Review of Statistics and Its Application.

[18]  Pravesh Kothari,et al.  Outlier-Robust Clustering of Non-Spherical Mixtures , 2020, ArXiv.

[19]  Jonathan Ullman,et al.  Differentially Private Algorithms for Learning Mixtures of Separated Gaussians , 2019, 2020 Information Theory and Applications Workshop (ITA).

[20]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[21]  Janardhan Kulkarni,et al.  Privately Learning Markov Random Fields , 2020, ICML.

[22]  Jonathan Ullman,et al.  Private Mean Estimation of Heavy-Tailed Distributions , 2020, COLT.

[23]  Jerry Li,et al.  Mixture models, robustness, and sum of squares proofs , 2017, STOC.

[24]  Ilias Diakonikolas,et al.  Robustly Learning any Clusterable Mixture of Gaussians , 2020, ArXiv.

[25]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[26]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[27]  Jonathan Ullman,et al.  CoinPress: Practical Private Mean and Covariance Estimation , 2020, NeurIPS.

[28]  Weihao Kong,et al.  Robust and Differentially Private Mean Estimation , 2021, NeurIPS.

[29]  J. Gallier Quadratic Optimization Problems , 2020, Linear Algebra and Optimization with Applications to Machine Learning.

[30]  Yichen Wang,et al.  The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy , 2019, The Annals of Statistics.

[31]  Pravesh Kothari,et al.  Outlier-robust moment-estimation via sum-of-squares , 2017, ArXiv.

[32]  Banghua Zhu,et al.  Generalized Resilience and Robust Statistics , 2019, The Annals of Statistics.

[33]  Andrew Bray,et al.  Differentially Private Confidence Intervals , 2020, ArXiv.

[34]  Pravesh Kothari,et al.  Better Agnostic Clustering Via Relaxed Tensor Norms , 2017, ArXiv.

[35]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[36]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[37]  Pravesh Kothari,et al.  Semialgebraic Proofs and Efficient Algorithm Design , 2019, Electron. Colloquium Comput. Complex..

[38]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[39]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[40]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[41]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[42]  Yurii Nesterov,et al.  Squared Functional Systems and Optimization Problems , 2000 .

[43]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[44]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[45]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[46]  Gregory Valiant,et al.  Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers , 2017, ITCS.

[47]  Kevin Tian,et al.  Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing , 2020, NeurIPS.

[48]  Ryan O'Donnell,et al.  Hypercontractive inequalities via SOS, and the Frankl-Rödl graph , 2012, SODA.

[49]  Jerry Li,et al.  Privately Learning High-Dimensional Distributions , 2018, COLT.

[50]  Thomas Steinke,et al.  Private Hypothesis Selection , 2019, IEEE Transactions on Information Theory.

[51]  Samuel B. Hopkins,et al.  Efficient Mean Estimation with Pure Differential Privacy via a Sum-of-Squares Exponential Mechanism , 2021, ArXiv.

[52]  David P. Woodruff,et al.  Faster Algorithms for High-Dimensional Robust Covariance Estimation , 2019, COLT.

[53]  Peter Manohar,et al.  Polynomial-Time Sum-of-Squares Can Robustly Estimate Mean and Covariance of Gaussians Optimally , 2021, ALT.

[54]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[55]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[56]  Thomas Steinke,et al.  A Private and Computationally-Efficient Estimator for Unbounded Gaussians , 2021, ArXiv.

[57]  Thomas Steinke,et al.  Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation , 2019, NeurIPS.

[58]  Khanh Dao Duc,et al.  OPERATOR NORM INEQUALITIES BETWEEN TENSOR UNFOLDINGS ON THE PARTITION LATTICE. , 2016, Linear algebra and its applications.

[59]  David Steurer,et al.  Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method , 2014, STOC.

[60]  Lieven De Lathauwer,et al.  Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[61]  Samuel B. Hopkins Mean estimation with sub-Gaussian rates in polynomial time , 2018, The Annals of Statistics.

[62]  Daniel M. Kane,et al.  Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[63]  Samuel B. Hopkins,et al.  Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization , 2020, NeurIPS.

[64]  Ilias Diakonikolas,et al.  Outlier-Robust Clustering of Gaussians and Other Non-Spherical Mixtures , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[65]  Jonathan Ullman,et al.  Private Identity Testing for High-Dimensional Distributions , 2019, NeurIPS.

[66]  Weihao Kong,et al.  Differential privacy and robust statistics in high dimensions , 2021, COLT.

[67]  Jinhui Xu,et al.  On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data , 2020, ICML.

[68]  Ainesh Bakshi,et al.  Robust linear regression: optimal rates in polynomial time , 2020, STOC.

[69]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[70]  Thomas Steinke,et al.  Interactive fingerprinting codes and the hardness of preventing false discovery , 2014, 2016 Information Theory and Applications Workshop (ITA).

[71]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.