Privacy-Preserving Parametric Inference: A Case for Robust Statistics

Abstract Differential privacy is a cryptographically motivated approach to privacy that has become a very active field of research over the last decade in theoretical computer science and machine learning. In this paradigm, one assumes there is a trusted curator who holds the data of individuals in a database and the goal of privacy is to simultaneously protect individual data while allowing the release of global characteristics of the database. In this setting, we introduce a general framework for parametric inference with differential privacy guarantees. We first obtain differentially private estimators based on bounded influence M-estimators by leveraging their gross-error sensitivity in the calibration of a noise term added to them to ensure privacy. We then show how a similar construction can also be applied to construct differentially private test statistics analogous to the Wald, score, and likelihood ratio tests. We provide statistical guarantees for all our proposals via an asymptotic analysis. An interesting consequence of our results is to further clarify the connection between differential privacy and robust statistics. In particular, we demonstrate that differential privacy is a weaker stability requirement than infinitesimal robustness, and show that robust M-estimators can be easily randomized to guarantee both differential privacy and robustness toward the presence of contaminated data. We illustrate our results both on simulated and real data. Supplementary materials for this article are available online.

[1]  Victor-Emmanuel Brunel,et al.  Differentially private sub-Gaussian location estimators , 2019, 1906.11923.

[2]  Jonathan Ullman,et al.  Private Identity Testing for High-Dimensional Distributions , 2019, NeurIPS.

[3]  Aleksandra B. Slavkovic,et al.  Differentially Private Inference for Binomial Data , 2019, J. Priv. Confidentiality.

[4]  Or Sheffet,et al.  Old Techniques in Differentially Private Linear Regression , 2019, International Conference on Algorithmic Learning Theory.

[5]  Adam D. Smith,et al.  The structure of optimal private tests for simple hypotheses , 2018, STOC.

[6]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[7]  Daniel Sheldon,et al.  Differentially Private Bayesian Inference for Exponential Families , 2018, NeurIPS.

[8]  Aleksandra B. Slavkovic,et al.  Differentially Private Uniformly Most Powerful Tests for Binomial Data , 2018, NeurIPS.

[9]  Or Sheffet,et al.  Locally Private Hypothesis Testing , 2018, ICML.

[10]  Marco Avella-Medina Influence functions for penalized M-estimators , 2017 .

[11]  Ashwin Machanavajjhala,et al.  Differentially Private Significance Tests for Regression Coefficients , 2017, Journal of Computational and Graphical Statistics.

[12]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[13]  Fang Liu,et al.  Model-based Differentially Private Data Synthesis and Statistical Inference in Multiple Synthetic Datasets , 2016, Trans. Data Priv..

[14]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[15]  Ryan M. Rogers,et al.  Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing , 2016, ICML 2016.

[16]  Daniel Kifer,et al.  Revisiting Differentially Private Hypothesis Tests for Categorical Data , 2015 .

[17]  Or Sheffet,et al.  Differentially Private Ordinary Least Squares , 2015, ICML.

[18]  Alexander J. Smola,et al.  Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo , 2015, ICML.

[19]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[20]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[21]  Christos Dimitrakakis,et al.  Robust and Private Bayesian Inference , 2013, ALT.

[22]  F. Trojani,et al.  Higher-Order Infinitesimal Robustness , 2012 .

[23]  Kamalika Chaudhuri,et al.  Convergence Rates for Differentially Private Statistical Estimation , 2012, ICML.

[24]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[25]  Arun Rajkumar,et al.  A Differentially Private Stochastic Gradient Descent Algorithm for Multiparty Classification , 2012, AISTATS.

[26]  Larry A. Wasserman,et al.  Random Differential Privacy , 2011, J. Priv. Confidentiality.

[27]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[28]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[29]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[30]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[31]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[32]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[33]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[34]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[35]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[36]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[37]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[38]  Ingo Steinwart,et al.  Consistency and robustness of kernel-based regression in convex risk minimization , 2007, 0709.0626.

[39]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[40]  L. Cox Statistical Disclosure Limitation , 2006 .

[41]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[42]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[43]  P. J. Huber Robust Statistics: Huber/Robust Statistics , 2005 .

[44]  Jerome P. Reiter,et al.  Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study , 2005 .

[45]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[46]  E. Ronchetti,et al.  Robust Inference for Generalized Linear Models , 2001 .

[47]  E. Ronchetti,et al.  Robust inference with GMM estimators , 2001 .

[48]  Q. Shao,et al.  On Parameters of Increasing Dimensions , 2000 .

[49]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[50]  E. Ronchetti,et al.  Robust Bounded-Influence Tests in General Parametric Models , 1994 .

[51]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[52]  Douglas G. Simpson,et al.  Robust Direction Estimation , 1992 .

[53]  B. R. Clarke Nonsmooth analysis and Fréchet differentiability of M-functionals , 1986 .

[54]  David A. Belsley,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[55]  R. W. Wedderburn Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method , 1974 .

[56]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[57]  R. V. Mises On the Asymptotic Distribution of Differentiable Statistical Functions , 1947 .

[58]  Christos Dimitrakakis,et al.  Differential Privacy for Bayesian Inference through Posterior Sampling , 2017, J. Mach. Learn. Res..

[59]  Ashwin Machanavajjhala,et al.  Is my model any good: differentially private regression diagnostics , 2017, Knowledge and Information Systems.

[60]  Stefan Van Aelst,et al.  Robust tests for linear regression models based on τ-estimates , 2016, Comput. Stat. Data Anal..

[61]  Thorsten Dickhaus,et al.  Testing a Statistical Hypothesis , 2015 .

[62]  Mikhail Zhelonkin,et al.  Robustness in sample selection models , 2013 .

[63]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[64]  Shuangzhe Liu,et al.  Regression diagnostics , 2020, Applied Quantitative Analysis for Real Estate.

[65]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[66]  Jerome P. Reiter,et al.  Satisfying Disclosure Restrictions With Synthetic Data Sets , 2002 .

[67]  Benny Pinkas,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[68]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[69]  P. Rousseeuw,et al.  The Change-of-Variance Curve and Optimal Redescending M-Estimators , 1981 .

[70]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[71]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .